This repo contains an example training workflow consisting of three simple SkyPilot tasks:
data_preprocessing
: Ingests data and writes it to a bucket.train
: Trains a model on the data.eval
: Evaluates the model. Can be optionally run on the same cluster astrain
.
These tasks don't actually run any code, but they demonstrate how to structure a training workflow on SkyPilot.
When developing the workflow, you can run the tasks independently using sky launch
:
$ sky launch -c data data_preprocessing.yaml
...
The train and eval step can be run in a similar way, and the eval step can be run on the same cluster as the train step by setting the --cluster
flag:
$ sky launch -c train train.yaml
...
# Run eval on the same cluster as train
$ sky exec train eval.yaml