Skip to content

Commit

Permalink
Add documentation about DALI proxy in EfficientNet and ResNet examples (
Browse files Browse the repository at this point in the history
#5800)

Signed-off-by: Joaquin Anton Guirao <[email protected]>
  • Loading branch information
jantonguirao authored Feb 4, 2025
1 parent 43874c2 commit e3e7c72
Show file tree
Hide file tree
Showing 4 changed files with 101 additions and 38 deletions.
34 changes: 32 additions & 2 deletions docs/examples/use_cases/pytorch/efficientnet/readme.rst
Original file line number Diff line number Diff line change
Expand Up @@ -89,11 +89,27 @@ You may need to adjust ``--batch-size`` parameter for your machine.

You can change the data loader and automatic augmentation scheme that are used by adding:

* ``--data-backend``: ``dali`` | ``pytorch`` | ``synthetic``,
* ``--data-backend``: ``dali`` | ``dali_proxy`` | ``pytorch`` | ``synthetic``,
* ``--automatic-augmentation``: ``disabled`` | ``autoaugment`` | ``trivialaugment`` (the last one only for DALI),
* ``--dali-device``: ``cpu`` | ``gpu`` (only for DALI).

By default DALI GPU-variant with AutoAugment is used.
By default DALI GPU-variant with AutoAugment is used (``dali`` and ``dali_proxy`` backends).

Data Backends
-------------

- **dali**:
Leverages a DALI pipeline along with DALI's PyTorch iterator for data loading, preprocessing, and augmentation.

- **dali_proxy**:
Uses a DALI pipeline for preprocessing and augmentation while relying on PyTorch's data loader. DALI Proxy facilitates the transfer of data to DALI for processing.
See :ref:`pytorch_dali_proxy`.

- **pytorch**:
Employs the native PyTorch data loader for data preprocessing and augmentation.

- **synthetic**:
Creates synthetic data on the fly, which is useful for testing and benchmarking purposes. This backend eliminates the need for actual datasets, providing a convenient way to simulate data loading.

For example to run the EfficientNet with AMP on a batch size of 128 with DALI using TrivialAugment you need to invoke:

Expand Down Expand Up @@ -161,6 +177,20 @@ To run training benchmarks with different data loaders and automatic augmentatio
--workspace $RESULT_WORKSPACE
--report-file bench_report_dali_ta.json $PATH_TO_IMAGENET
# DALI proxy with AutoAugment
python multiproc.py --nproc_per_node 8 ./main.py --amp --static-loss-scale 128
--batch-size 128 --epochs 4 --no-checkpoints --training-only
--data-backend dali_proxy --automatic-augmentation autoaugment
--workspace $RESULT_WORKSPACE
--report-file bench_report_dali_proxy_aa.json $PATH_TO_IMAGENET
# DALI proxy with TrivialAugment
python multiproc.py --nproc_per_node 8 ./main.py --amp --static-loss-scale 128
--batch-size 128 --epochs 4 --no-checkpoints --training-only
--data-backend dali_proxy --automatic-augmentation trivialaugment
--workspace $RESULT_WORKSPACE
--report-file bench_report_dali_proxy_ta.json $PATH_TO_IMAGENET
# PyTorch without automatic augmentations
python multiproc.py --nproc_per_node 8 ./main.py --amp --static-loss-scale 128
--batch-size 128 --epochs 4 --no-checkpoints --training-only
Expand Down
10 changes: 6 additions & 4 deletions docs/examples/use_cases/pytorch/resnet50/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -93,12 +93,14 @@ def parse():
'"dali" for DALI data loader, or "dali_proxy" for PyTorch dataloader with DALI proxy preprocessing.')
parser.add_argument('--prof', default=-1, type=int,
help='Only run 10 iterations for profiling.')
parser.add_argument('--deterministic', action='store_true')

parser.add_argument('--deterministic', action='store_true',
help='Enable deterministic behavior for reproducibility')
parser.add_argument('--fp16-mode', default=False, action='store_true',
help='Enable half precision mode.')
parser.add_argument('--loss-scale', type=float, default=1)
parser.add_argument('--channels-last', type=bool, default=False)
parser.add_argument('--loss-scale', type=float, default=1,
help='Scaling factor for loss to prevent underflow in FP16 mode.')
parser.add_argument('--channels-last', type=bool, default=False,
help='Use channels last memory format for tensors.')
parser.add_argument('-t', '--test', action='store_true',
help='Launch test mode with preset arguments')
args = parser.parse_args()
Expand Down
94 changes: 62 additions & 32 deletions docs/examples/use_cases/pytorch/resnet50/pytorch-resnet50.rst
Original file line number Diff line number Diff line change
Expand Up @@ -44,39 +44,69 @@ The default learning rate schedule starts at 0.1 and decays by a factor of 10 ev
python main.py -a alexnet --lr 0.01 [imagenet-folder with train and val folders]
Data loaders
------------

- **dali**:
Leverages a DALI pipeline along with DALI's PyTorch iterator for data loading, preprocessing, and augmentation.

- **dali_proxy**:
Uses a DALI pipeline for preprocessing and augmentation while relying on PyTorch's data loader. DALI Proxy facilitates the transfer of data to DALI for processing.
See :ref:`pytorch_dali_proxy`.

- **pytorch**:
Employs the native PyTorch data loader for data preprocessing and augmentation.

Usage
-----

.. code-block:: bash
main.py [-h] [--arch ARCH] [-j N] [--epochs N] [--start-epoch N] [-b N] [--lr LR] [--momentum M] [--weight-decay W] [--print-freq N] [--resume PATH] [-e] [--pretrained] [--opt-level] DIR
PyTorch ImageNet Training
positional arguments:
DIR path(s) to dataset (if one path is provided, it is assumed to have subdirectories named "train" and "val"; alternatively, train and val paths can be specified directly by providing both paths as arguments)
optional arguments (for the full list please check `Apex ImageNet example
<https://github.com/NVIDIA/apex/tree/master/examples/imagenet>`_)
-h, --help show this help message and exit
--arch ARCH, -a ARCH model architecture: alexnet | resnet | resnet101
| resnet152 | resnet18 | resnet34 | resnet50 | vgg
| vgg11 | vgg11_bn | vgg13 | vgg13_bn | vgg16
| vgg16_bn | vgg19 | vgg19_bn (default: resnet18)
-j N, --workers N number of data loading workers (default: 4)
--epochs N number of total epochs to run
--start-epoch N manual epoch number (useful on restarts)
-b N, --batch-size N mini-batch size (default: 256)
--lr LR, --learning-rate LR initial learning rate
--momentum M momentum
--weight-decay W, --wd W weight decay (default: 1e-4)
--print-freq N, -p N print frequency (default: 10)
--resume PATH path to latest checkpoint (default: none)
-e, --evaluate evaluate model on validation set
--pretrained use pre-trained model
--dali_cpu use CPU based pipeline for DALI, for heavy GPU
networks it may work better, for IO bottlenecked
one like RN18 GPU default should be faster
--data_loader Select data loader: "pytorch" for native PyTorch data loader,
"dali" for DALI data loader, or "dali_proxy" for PyTorch dataloader with DALI proxy preprocessing.
--fp16-mode enables mixed precision mode
main.py [-h] [--arch ARCH] [-j N] [--epochs N] [--start-epoch N] [-b N] [--lr LR] [--momentum M] [--weight-decay W] [--print-freq N] [--resume PATH]
[-e] [--pretrained] [--dali_cpu] [--data_loader {pytorch,dali,dali_proxy}] [--prof PROF] [--deterministic] [--fp16-mode]
[--loss-scale LOSS_SCALE] [--channels-last CHANNELS_LAST] [-t]
[DIR ...]
PyTorch ImageNet Training
positional arguments:
DIR path(s) to dataset (if one path is provided, it is assumed to have subdirectories named "train" and "val"; alternatively, train and val paths can
be specified directly by providing both paths as arguments)
options:
-h, --help show this help message and exit
--arch ARCH, -a ARCH model architecture: alexnet | convnext_base | convnext_large | convnext_small | convnext_tiny | densenet121 | densenet161 | densenet169 |
densenet201 | efficientnet_b0 | efficientnet_b1 | efficientnet_b2 | efficientnet_b3 | efficientnet_b4 | efficientnet_b5 | efficientnet_b6 |
efficientnet_b7 | efficientnet_v2_l | efficientnet_v2_m | efficientnet_v2_s | get_model | get_model_builder | get_model_weights | get_weight |
googlenet | inception_v3 | list_models | maxvit_t | mnasnet0_5 | mnasnet0_75 | mnasnet1_0 | mnasnet1_3 | mobilenet_v2 | mobilenet_v3_large |
mobilenet_v3_small | regnet_x_16gf | regnet_x_1_6gf | regnet_x_32gf | regnet_x_3_2gf | regnet_x_400mf | regnet_x_800mf | regnet_x_8gf |
regnet_y_128gf | regnet_y_16gf | regnet_y_1_6gf | regnet_y_32gf | regnet_y_3_2gf | regnet_y_400mf | regnet_y_800mf | regnet_y_8gf | resnet101 |
resnet152 | resnet18 | resnet34 | resnet50 | resnext101_32x8d | resnext101_64x4d | resnext50_32x4d | shufflenet_v2_x0_5 | shufflenet_v2_x1_0 |
shufflenet_v2_x1_5 | shufflenet_v2_x2_0 | squeezenet1_0 | squeezenet1_1 | swin_b | swin_s | swin_t | swin_v2_b | swin_v2_s | swin_v2_t | vgg11 |
vgg11_bn | vgg13 | vgg13_bn | vgg16 | vgg16_bn | vgg19 | vgg19_bn | vit_b_16 | vit_b_32 | vit_h_14 | vit_l_16 | vit_l_32 | wide_resnet101_2 |
wide_resnet50_2 (default: resnet18)
-j N, --workers N number of data loading workers (default: 4)
--epochs N number of total epochs to run
--start-epoch N manual epoch number (useful on restarts)
-b N, --batch-size N mini-batch size per process (default: 256)
--lr LR, --learning-rate LR
Initial learning rate. Will be scaled by <global batch size>/256: args.lr = args.lr*float(args.batch_size*args.world_size)/256. A warmup schedule
will also be applied over the first 5 epochs.
--momentum M momentum
--weight-decay W, --wd W
weight decay (default: 1e-4)
--print-freq N, -p N print frequency (default: 10)
--resume PATH path to latest checkpoint (default: none)
-e, --evaluate evaluate model on validation set
--pretrained use pre-trained model
--dali_cpu Runs CPU based version of DALI pipeline.
--data_loader {pytorch,dali,dali_proxy}
Select data loader: "pytorch" for native PyTorch data loader, "dali" for DALI data loader, or "dali_proxy" for PyTorch dataloader with DALI proxy
preprocessing.
--prof PROF Only run 10 iterations for profiling.
--deterministic Enable deterministic behavior for reproducibility
--fp16-mode Enable half precision mode.
--loss-scale LOSS_SCALE
Scaling factor for loss to prevent underflow in FP16 mode.
--channels-last CHANNELS_LAST
Use channels last memory format for tensors.
-t, --test Launch test mode with preset arguments
1 change: 1 addition & 0 deletions docs/plugins/pytorch_dali_proxy.rst
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
.. _pytorch_dali_proxy:
PyTorch DALI Proxy
==================

Expand Down

0 comments on commit e3e7c72

Please sign in to comment.