Skip to content

02 Preparing the environment

zoi edited this page May 20, 2022 · 27 revisions

Pulling the Image

To develop or test with BDE, you need first to download the container image using docker pull:

docker pull ghcr.io/databloom-ai/bde:main

Running the Image

As soon as the image pull is done you can start BDE with docker run:

docker run -p 8888:8888 -v ${HOME}/.ivy2:/home/jovyan/.ivy2 -v ${PWD}:/home/jovyan/files ghcr.io/databloom-ai/bde:main

Then in your terminal you will get an output similar to

[I 2022-03-23 15:19:19.232 ServerApp] Jupyter Server 1.13.5 is running at:
[I 2022-03-23 15:19:19.232 ServerApp] http://742d2a5f9073:8888/lab?token=cb9756f220f065422fed5bcb8f7cb32523b5b8beaa28fba1
[I 2022-03-23 15:19:19.232 ServerApp]  or http://127.0.0.1:8888/lab?token=cb9756f220f065422fed5bcb8f7cb32523b5b8beaa28fba1
[I 2022-03-23 15:19:19.232 ServerApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[C 2022-03-23 15:19:19.237 ServerApp] 
    
    To access the server, open this file in a browser:
        file:///home/jovyan/.local/share/jupyter/runtime/jpserver-8-open.html
    Or copy and paste one of these URLs:
        http://742d2a5f9073:8888/lab?token=cb9756f220f065422fed5bcb8f7cb32523b5b8beaa28fba1
     or http://127.0.0.1:8888/lab?token=cb9756f220f065422fed5bcb8f7cb32523b5b8beaa28fba1

To access Jupyter Studio copy the URL as shown:

http://127.0.0.1:8888/lab?token={YOUR_TOKEN}

YOUR_TOKEN is generated and it is different in each execution of the command

Download files as you required

The Notebook already comes with several example plans. To run them, you can download the files below and place them in your current folder pwd (this is the virtual folder "files"):

  • To run k-means, download US Census as "census.txt" from here: US Census Income

  • To run SGD, download HIGGS Dataset and uncompress it as "HIGGS.csv" from here: HIGGS Dataset

  • To run TPCH Hybrid Query 3 run the next commands:

# Pull the image to generate the data
docker pull ghcr.io/databloom-ai/tpch-docker:main
# Generate the files in target folder
docker run -it  -v "$(pwd)":/data ghcr.io/databloom-ai/tpch-docker:main -h

More information in "/databloom-ai/TPCH-Docker" about additional parameters to generate the data.

  • Then, Install Postgres and migrate the generated data to the database.
  • Remember to update the example plan setting configuration.setProperty("wayang.postgres.jdbc.url", YOUR_JDBC_CONNECTION) regarding to your database.

please continue to "Working with Jupyter Studio" page.

Clone this wiki locally