Skip to content

Commit

Permalink
replace python/ray/spark launcher options docs with a single doc on l…
Browse files Browse the repository at this point in the history
…auncher options

Signed-off-by: David Wood <[email protected]>
  • Loading branch information
daw3rd committed Jan 22, 2025
1 parent 0598b5f commit a9bcc22
Show file tree
Hide file tree
Showing 41 changed files with 104 additions and 194 deletions.
2 changes: 1 addition & 1 deletion data-processing-lib/doc/advanced-transform-tutorial.md
Original file line number Diff line number Diff line change
Expand Up @@ -302,5 +302,5 @@ as follows:
```shell
make run-cli-sample
```
See the [launcher options](ray-launcher-options.md) for a complete list of
See the [launcher options](launcher-options.md) for a complete list of
transform-independent command line options.
2 changes: 1 addition & 1 deletion data-processing-lib/doc/data-access-factory.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ the processing of input data files and the expected destination
of the processed files.
The `DataAccessFactory` is most often configured using command line arguments
to specify the type of `DataAccess` instance to create
(see `--data_*` options [here](python-launcher-options.md).
(see `--data_*` options [here](launcher-options.md).
Currently, it supports
[DataAccessLocal](../python/src/data_processing/data_access/data_access_local.py)
and
Expand Down
Original file line number Diff line number Diff line change
@@ -1,24 +1,16 @@
# Ray Launcher Command Line Options
A number of command line options are available when launching a transform.
# Runtime Command Line Options

The following is a current --help output (a work in progress) for
the `NOOPTransform` (note the --noop_sleep_sec and --noop_pwd options):
A number of command line options are available when launching a transform.
* Transform options defined by the specific transform
* Runtime/launcher independent options, primarily for identifying data sources and destinations.
* Runtime-specific options for controlling aspects of the individual runtime.

The runtime options are discussed below (see the specific transform or using -help
to determine transform options.)

## Runtime-independent Launcher CLI Arguments
The following are the set of command line launcher options available to all runtimes.
```
usage: runtime.py [-h] [--run_locally RUN_LOCALLY] [--noop_sleep_sec NOOP_SLEEP_SEC] [--noop_pwd NOOP_PWD] [--data_s3_cred DATA_S3_CRED] [--data_s3_config DATA_S3_CONFIG] [--data_local_config DATA_LOCAL_CONFIG]
[--data_max_files DATA_MAX_FILES] [--data_checkpointing DATA_CHECKPOINTING] [--data_data_sets DATA_DATA_SETS] [--data_files_to_use DATA_FILES_TO_USE] [--data_num_samples DATA_NUM_SAMPLES]
[--runtime_num_workers RUNTIME_NUM_WORKERS] [--runtime_worker_options RUNTIME_WORKER_OPTIONS] [--runtime_creation_delay RUNTIME_CREATION_DELAY] [--runtime_pipeline_id RUNTIME_PIPELINE_ID]
[--runtime_job_id RUNTIME_JOB_ID] [--runtime_code_location RUNTIME_CODE_LOCATION]
Driver for noop processing
options:
-h, --help show this help message and exit
--run_locally RUN_LOCALLY
running ray local flag
--noop_sleep_sec NOOP_SLEEP_SEC
Sleep actor for a number of seconds while processing the data frame, before writing the file to COS
--noop_pwd NOOP_PWD A dummy password which should be filtered out of the metadata
--data_s3_cred DATA_S3_CRED
AST string of options for s3 credentials. Only required for S3 data access.
access_key: access key help text
Expand Down Expand Up @@ -49,6 +41,29 @@ options:
list of file extensions to choose for input.
--data_num_samples DATA_NUM_SAMPLES
number of random input files to process
```

## Python Launcher CLI Arguments
The following are the set of command line launcher options available on for the python runtime.
```
--runtime_num_processors RUNTIME_NUM_PROCESSORS
size of multiprocessing pool
--runtime_pipeline_id RUNTIME_PIPELINE_ID
pipeline id
--runtime_job_id RUNTIME_JOB_ID
job id
--runtime_code_location RUNTIME_CODE_LOCATION
AST string containing code location
github: Github repository URL.
commit_hash: github commit hash
path: Path within the repository
Example: { 'github': 'https://github.com/somerepo', 'commit_hash': '1324',
'path': 'transforms/universal/code' }
```
## Ray Launcher CLI Arguments
The following are the set of command line launcher options available on for the Ray runtime.
```
--runtime_num_workers RUNTIME_NUM_WORKERS
number of workers
--runtime_worker_options RUNTIME_WORKER_OPTIONS
Expand Down Expand Up @@ -77,3 +92,18 @@ options:
Example: { 'github': 'https://github.com/somerepo', 'commit_hash': '1324',
'path': 'transforms/universal/code' }
```
## Spark Launcher CLI Arguments
The following are the set of command line launcher options available on for the Spark runtime.
```
--runtime_pipeline_id RUNTIME_PIPELINE_ID
pipeline id
--runtime_job_id RUNTIME_JOB_ID
job id
--runtime_code_location RUNTIME_CODE_LOCATION
AST string containing code location
github: Github repository URL.
commit_hash: github commit hash
path: Path within the repository
Example: { 'github': 'https://github.com/somerepo', 'commit_hash': '1324',
'path': 'transforms/universal/code' }
```
64 changes: 0 additions & 64 deletions data-processing-lib/doc/python-launcher-options.md

This file was deleted.

2 changes: 1 addition & 1 deletion data-processing-lib/doc/ray-runtime.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ Note that the launcher defines some additional CLI parameters that are used to c
[data access](../python/src/data_processing/data_access/data_access_factory.py). Things such as data access configuration,
number of workers, worker resources, etc.
Discussion of these options is beyond the scope of this document
(see [Launcher Options](ray-launcher-options.md) for a list of available options.)
(see [Launcher Options](launcher-options.md) for a list of available options.)

## Transform Configuration
In general, a transform should be able to run in both the python and Ray runtimes.
Expand Down
4 changes: 2 additions & 2 deletions data-processing-lib/doc/simplest-transform-tutorial.md
Original file line number Diff line number Diff line change
Expand Up @@ -217,7 +217,7 @@ To run
python noop_main.py --noop_sleep_sec 2 \
--data_local_config "{'input_folder': '"$NOOP_INPUT"', 'output_folder': '/tmp/noop-output'}"
```
See the [python launcher options](python-launcher-options.md) for a complete list of
See the [launcher options](launcher-options.md) for a complete list of
transform-independent command line options.

### Ray Runtime
Expand All @@ -241,5 +241,5 @@ python noop_main.py --noop_sleep_sec 2 \
--data_local_config "{'input_folder': '"$NOOP_INPUT"', 'output_folder': '/tmp/noop-output'}" --run_locally True
```
which will start local ray instance ( ray should be pre [installed](https://docs.ray.io/en/latest/ray-overview/installation.html)).
See the [ray launcher options](ray-launcher-options.md) for a complete list of
See the [launcher options](launcher-options.md) for a complete list of
transform-independent command line options.
63 changes: 0 additions & 63 deletions data-processing-lib/doc/spark-launcher-options.md

This file was deleted.

2 changes: 1 addition & 1 deletion doc/quick-start/run-transform-image.md
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,7 @@ docker run --rm
```
This is functionally equivalent to the python-runtime, but additional
configuration can be provided (see the
[ray launcher args](../../data-processing-lib/doc/ray-launcher-options.md))
[launcher args](../../data-processing-lib/doc/launcher-options.md))
for details.

### S3-located Data - Python Runtime
Expand Down
5 changes: 5 additions & 0 deletions transforms/README.md.template
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,11 @@ found in [XYZTransform](dpk_xyz/transform.py)
|------------------|---------|-------------------------------------------------------------------------------------------------------------------------------------------------------------|
| xyz_ | ... | ... |

When running the transform with a launcher (i.e. TransformLauncher),
the above are available as command line options in addition to
[the options provided by the launcher](../../../../data-processing-lib/doc/launcher-options.md).


## Usage

### Command Line-Launched
Expand Down
2 changes: 1 addition & 1 deletion transforms/code/code2parquet/python/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -95,7 +95,7 @@ the file specified in `supported_langs_file`.
### Launched Command Line Options
When running the transform with the Ray launcher (i.e. TransformLauncher),
the following command line arguments are available in addition to
[the options provided by the launcher](../../../../data-processing-lib/doc/ray-launcher-options.md).
[the options provided by the launcher](../../../../data-processing-lib/doc/launcher-options.md).

* `--code2parquet_supported_langs_file` - set the `supported_langs_file` configuration key.
* `--code2parquet_detect_programming_lang` - set the `detect_programming_lang` configuration key.
Expand Down
2 changes: 1 addition & 1 deletion transforms/code/code2parquet/ray/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ code2parquet transform configuration and command line options are the same as fo
### Launched Command Line Options
In addition to those available to the transform as defined in [here](../python/README.md),
the set of
[ray launcher](../../../../data-processing-lib/doc/ray-launcher-options.md) are available.
[launcher options](../../../../data-processing-lib/doc/launcher-options.md) are available.

### Running the samples
To run the samples, use the following `make` targets
Expand Down
2 changes: 1 addition & 1 deletion transforms/code/code_profiler/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -124,7 +124,7 @@ Document Quality configuration and command line options are the same as for the
When running the transform with the Ray launcher (i.e., TransformLauncher),
In addition to those available to the transform as defined here,
the set of
[ray launcher](../../../data-processing-lib/doc/ray-launcher-options.md) are available.
[launcher options](../../../data-processing-lib/doc/launcher-options.md) are available.

#### Running the samples
To run the samples, use the following `make` target
Expand Down
3 changes: 1 addition & 2 deletions transforms/code/code_quality/python/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,8 +38,7 @@ It uses a tokenizer to collect metrics specific to token ratio. It is designed
### Launcher Command Line Options

The following command line arguments are available in addition to
the options provided by the [ray launcher](../../../../data-processing-lib/doc/ray-launcher-options.md)
and the [python launcher](../../../../data-processing-lib/doc/python-launcher-options.md).
the options provided by the [launcher](../../../../data-processing-lib/doc/launcher-options.md)

* "--contents_column_name" - input a column name which contains data to process. The default column name: `contents`
* "--language_column_name" - input a column name which contains programming language details. The default column name: `language`
Expand Down
2 changes: 1 addition & 1 deletion transforms/code/code_quality/ray/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ Transform configuration options are the same as the base python transform.
### Launched Command Line Options
In addition to those available to the transform as defined in [here](../python/README.md),
the set of
[ray launcher](../../../../data-processing-lib/doc/ray-launcher-options.md) are available.
[launcher options](../../../../data-processing-lib/doc/launcher-options.md) are available.

### Running the samples
To run the samples, use the following `make` targets
Expand Down
2 changes: 1 addition & 1 deletion transforms/code/header_cleanser/python/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ You can run the [header_cleanser_local.py](src/header_cleanser_local.py) (python
### Launched Command Line Options
When running the transform with the Ray launcher (i.e. TransformLauncher),
the following command line arguments are available in addition to
the [python launcher](../../../../data-processing-lib/doc/python-launcher-options.md).
the [launcher](../../../../data-processing-lib/doc/launcher-options.md).
* --header_cleanser_contents_column_name - set the contents_column_name configuration key.
* --header_cleanser_document_id_column_name - set the document_id_column_name configuration key.
* --header_cleanser_license - set the license configuration key.
Expand Down
3 changes: 2 additions & 1 deletion transforms/code/header_cleanser/ray/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,8 @@ This project wraps the [header cleanser transform](../python) with a Ray runtime
## Running

### Launched Command Line Options
In addition to those available to the transform as defined in [here](../python/README.md), the set of [ray launcher](../../../../data-processing-lib/doc/ray-launcher-options.md) are available.
In addition to those available to the transform as defined in [here](../python/README.md),
the set of [launcher options](../../../../data-processing-lib/doc/launcher-options.md) are available.

### Running the samples
To run the samples, use the following `make` targets
Expand Down
2 changes: 1 addition & 1 deletion transforms/code/license_select/python/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ The transform can be configured with the following key/value pairs from the conf
### Launcher Command Line Options

The following command line arguments are available in addition to
the options provided by the [python launcher](../../../../data-processing-lib/doc/python-launcher-options.md).
the options provided by the [launcher](../../../../data-processing-lib/doc/launcher-options.md).

`--lc_license_column_name` - set the name of the column holds license to process

Expand Down
2 changes: 1 addition & 1 deletion transforms/code/license_select/ray/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ This project wraps the [license select transform](../python/README.md) with a Ra

In addition to those available to the transform as defined in [here](../python/README.md),
the set of
[ray launcher](../../../../data-processing-lib/doc/ray-launcher-options.md) are available.
[launcher options](../../../../data-processing-lib/doc/launcher-options.md) are available.

### Running the samples

Expand Down
Loading

0 comments on commit a9bcc22

Please sign in to comment.