backend-start.sh
: start the Hillview back-end service on the local machinedemo-data-cleaner.sh
: download a small test data and preprocesses itforce-gc.sh
: asks a Java process to execute GCforever.sh
: runs another command in a loop foreverfrontend-start.sh
: start the Hillview front-end service on the local machineinstall-dependencies.sh
: install all dependencies needed to build or develop Hillviewinstall.sh
: install the binary release of Hillview on a U*x systemlib.sh
: a small library of useful shell functions used by other scriptspackage-binaries.sh
: used to build an archive with executables and scripts which is used for the code distributionrebuild.sh
: build the Hillview front-end and back-endredeploy.sh
: Performs four consecutive actions on a remote Hillview installation: stops the services, rebuilds the software, deploys it, and restarts the serviceupload-file.sh
: Given a csv file it will guess a schema for it and upload it to a remote cluster chopped into small pieces.dump-greenplum.sh
: This script is used to connect Hillview with Greenplum distributed databases. It should be installed on each Greenplum worker machine
The following are templates that are used to generate actual shell scripts on a remoate cluster when Hillview is installed
-
hillview-aggregator-manager-template.sh
: used to generate a file calledhillview-aggregator-manager.sh
which can be used to start, stop, query a Hillview aggregation service. The generated file is are installed on each aggregator machines. -
hillview-webserver-manager-template.sh
: used to generate a file calledhillview-webserver-manager.sh
which can be used to start, stop, query a Hillview web server. The generated file is installed on the remote Hillview web server machine. -
hillview-worker-manager-template.sh
: used to generate a file calledhillview-worker-manager.sh
which can be used to start, stop, query a Hillview worker. The generated file is installed on each remote worker machine.
install-hillview.ps1
: a PowerShell script used to download and install Hillview on a Windows machine.detect-java.bat
: a Windows batch file which has a library that detects where Java is installedhillview-start.bat
: a Windows batch file which starts Hillview on the local machinehillview-stop.bat
: a Windows batch file which stops Hillview on the local machine
delete-data.py
: delete a folder from all machines in a Hillview clusterdeploy.py
: copy the Hillview binaries to all machines in a Hillview clusterdeploy-hdfs.py
: download and install HDFS on all machines in a hillview clusterdownload-data.py
: download the specified files from all machines in a clusterhillviewCommon.py
: common library used by other Python programsrun-on-all.py
: run a command on all machines in a Hillview clusterstart.py
: start the Hillview service on a remote clusterstatus.py
: check the Hillview service on a remote clusterstop.py
: stop the Hillview service on a remote clusterupload-data.py
: upload a set of files to all machines in a Hillview cluster in a round-robin fashion
config.json
: skeleton configuration file for a Hillview clusterconfig-local.json
: description of a Hillview cluster that consists of just the local machine (used both as a web server and as a worker)
- Copy the file
config.json
and modify it to describe your cluster. Let's say you saved intomyconfig.json
- To run Hillview on the local machine just use
config-local.json
- You can install Hillview on your cluster by running
deploy.py myconfig.json
- You can start the Hillview service on the cluster by running
start.py myconfig.json
- You can stop the Hillview service on the cluster by running
stop.py myconfig.json
- You can check the status of the Hillview service on the cluster by running
status.py myconfig.json
Several scripts can be used to manage data distributed as raw files on a Hillview cluster. The convention is that a dataset is stored in one directory; the same directory is used on all machines, and each machine holds a fragment of the entire dataset.
Let's say we have a very large file x.csv that we want to upload to a cluster; we will chop it into pieces and install the pieces in the directory "data/x" on each machine (below the hillview working directory). This is done with:
$ ./upload-file.sh -c myconfig.json -d data/x -h -f x.csv -o
The various flags have the following significance:
-c myconfig.json
: specifies cluster where data is uploaded-d data/x
: specifies directory where data is uploaded on each machine-h
: specifies the fact that the filex.csv
has a header row-f x.csv
: specifies the input file-o
: specifies that the output should be saved as ORC files (a fast columnar format)
After uploading the file in this way it can be loaded by selecting `Load / ORC files' and specifying:
- File name pattern: data/x/x*.orc
- Schema file: schema
Alternatively, you can split the file locally and upload the pieces
afterwards; the following splits the file into pieces in the tmp
directory and then uploads these pieces to the cluster using the
upload-data.py
program:
$ ./upload-file.sh -d tmp -h -f x.csv -o
$ ./upload-data.py -d data/x -s schema mycluster.json tmp/*.orc
To list the files on the cluster you can use the run-on-all.py
script, e.g.:
$ ./run-on-all.py mycluster.json "ls -l data/x"
You can delete a directory from all machines of a cluster:
$ ./delete-data.py mycluster.json data/x
Finally, you can download back data you have uploaded to the cluster:
$ ./download-data.py mycluster.json data/x
When downloading the files this utility will create locally a folder for each machine in the cluster.