-
Notifications
You must be signed in to change notification settings - Fork 8
02 Preparing the environment
To develop or test with BDE, you need first to download the container image using docker pull
:
docker pull ghcr.io/databloom-ai/bde:main
As soon as the image pull is done you can start BDE with docker run
:
docker run -p 8888:8888 -v ${HOME}/.ivy2:/home/jovyan/.ivy2 -v ${PWD}:/home/jovyan/files ghcr.io/databloom-ai/bde:main
Then in your terminal you will get an output similar to
[I 2022-03-23 15:19:19.232 ServerApp] Jupyter Server 1.13.5 is running at:
[I 2022-03-23 15:19:19.232 ServerApp] http://742d2a5f9073:8888/lab?token=cb9756f220f065422fed5bcb8f7cb32523b5b8beaa28fba1
[I 2022-03-23 15:19:19.232 ServerApp] or http://127.0.0.1:8888/lab?token=cb9756f220f065422fed5bcb8f7cb32523b5b8beaa28fba1
[I 2022-03-23 15:19:19.232 ServerApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[C 2022-03-23 15:19:19.237 ServerApp]
To access the server, open this file in a browser:
file:///home/jovyan/.local/share/jupyter/runtime/jpserver-8-open.html
Or copy and paste one of these URLs:
http://742d2a5f9073:8888/lab?token=cb9756f220f065422fed5bcb8f7cb32523b5b8beaa28fba1
or http://127.0.0.1:8888/lab?token=cb9756f220f065422fed5bcb8f7cb32523b5b8beaa28fba1
To access Jupyter Studio copy the URL as shown:
http://127.0.0.1:8888/lab?token={YOUR_TOKEN}
YOUR_TOKEN is generated and it is different in each execution of the command
The Notebook already comes with several example plans. To run them, you can download the files below and place them in your current folder pwd
(this is the virtual folder "files"):
-
To run k-means, download US Census as "census.txt" from here: US Census Income
-
To run SGD, download HIGGS Dataset and uncompress it as "HIGGS.csv" from here: HIGGS Dataset
-
To run TPCH Hybrid Query 3 run the next commands:
# Pull the image to generate the data
docker pull ghcr.io/databloom-ai/tpch-docker:main
# Generate the files in target folder
docker run -it -v "$(pwd)":/data ghcr.io/databloom-ai/tpch-docker:main -h
More information in "/databloom-ai/TPCH-Docker" about additional parameters to generate the data.
- Then, Install Postgres and migrate the generated data to the database.
- Remember to update the example plan setting configuration.setProperty("wayang.postgres.jdbc.url", YOUR_JDBC_CONNECTION) regarding to your database.
please continue to "Working with Jupyter Studio" page.