[Docs] Add Multi-scale training and testing (open-mmlab#630)

* add ms docs * fix * fix * add en * update * update * update * update
Yuanyang-Zhu · Mar 7, 2023 · 421be53 · 421be53
1 parent 69b43e6
commit 421be53
Show file tree

Hide file tree

Showing 7 changed files with 98 additions and 1 deletion.
diff --git a/README.md b/README.md
@@ -204,6 +204,7 @@ For different parts from MMDetection, we have also prepared user guides and adva
 - [Resume training](docs/en/common_usage/resume_training.md)
 - [Enabling and disabling SyncBatchNorm](docs/en/common_usage/syncbn.md)
 - [Enabling AMP](docs/en/common_usage/amp_training.md)
+- [Multi-scale training and testing](docs/en/common_usage/ms_training_testing.md)
 - [TTA Related Notes](docs/en/common_usage/tta.md)
 - [Add plugins to the backbone network](docs/en/common_usage/plugins.md)
 - [Freeze layers](docs/en/common_usage/freeze_layers.md)

diff --git a/README_zh-CN.md b/README_zh-CN.md
@@ -225,9 +225,10 @@ MMYOLO 用法和 MMDetection 几乎一致，所有教程都是通用的，你也
 - [恢复训练](docs/zh_cn/common_usage/resume_training.md)
 - [开启和关闭 SyncBatchNorm](docs/zh_cn/common_usage/syncbn.md)
 - [开启混合精度训练](docs/zh_cn/common_usage/amp_training.md)
+- [多尺度训练和测试](docs/zh_cn/common_usage/ms_training_testing.md)
 - [测试时增强相关说明](docs/zh_cn/common_usage/tta.md)
 - [给主干网络增加插件](docs/zh_cn/common_usage/plugins.md)
-- [冻结指定网络层权重](docs/zh_cn/common_usage/common_usage/freeze_layers.md)
+- [冻结指定网络层权重](docs/zh_cn/common_usage/freeze_layers.md)
 - [输出模型预测结果](docs/zh_cn/common_usage/output_predictions.md)
 - [设置随机种子](docs/zh_cn/common_usage/set_random_seed.md)
 - [算法组合替换教程](docs/zh_cn/common_usage/module_combination.md)

diff --git a/configs/yolov5/yolov5_s-v61_fast_1xb12-ms-40e_cat.py b/configs/yolov5/yolov5_s-v61_fast_1xb12-ms-40e_cat.py
@@ -0,0 +1,13 @@
+_base_ = 'yolov5_s-v61_fast_1xb12-40e_cat.py'
+
+model = dict(
+    data_preprocessor=dict(
+        type='YOLOv5DetDataPreprocessor',
+        pad_size_divisor=32,
+        batch_augments=[
+            dict(
+                type='YOLOXBatchSyncRandomResize',
+                random_size_range=(480, 800),
+                size_divisor=32,
+                interval=1)
+        ]))
diff --git a/docs/en/common_usage/ms_training_testing.md b/docs/en/common_usage/ms_training_testing.md
@@ -0,0 +1,39 @@
+# Multi-scale training and testing
+
+## Multi-scale training
+
+The popular YOLOv5, YOLOv6, YOLOv7, YOLOv8 and RTMDet algorithms are supported in MMYOLO currently, and their default configuration is single-scale 640x640 training. There are two implementations of multi-scale training commonly used in the MM family of open source libraries
+
+1. Each image output in `train_pipeline` is at variable scale, and pad different scales of input images to the same scale by [stack_batch](https://github.com/open-mmlab/mmengine/blob/dbae83c52fa54d6dda08b6692b124217fe3b2135/mmengine/model/base_model/data_preprocessor.py#L260-L261) function in [DataPreprocessor](https://github.com/open-mmlab/mmdetection/blob/3.x/mmdet/models/data_preprocessors/data_preprocessor.py). Most of the algorithms in MMDet are implemented using this approach.
+2. Each image output in `train_pipeline` is at a fixed scale, and `DataPreprocessor` performs up- and down-sampling of image batches for multi-scale training directly.
+
+Both two multi-scale training approaches are supported in MMYOLO. Theoretically, the first implementation can generate richer scales, but its training efficiency is not as good as the second one due to its independent augmentation of a single image. Therefore, we recommend using the second approach.
+
+Take `configs/yolov5/yolov5_s-v61_fast_1xb12-40e_cat.py` configuration as an example, its default configuration is 640x640 fixed scale training, suppose you want to implement training in multiples of 32 and multi-scale range (480, 800), you can refer to YOLOX practice by [YOLOXBatchSyncRandomResize](https://github.com/open-mmlab/mmyolo/blob/dc85144fab20a970341550794857a2f2f9b11564/mmyolo/models/data_preprocessors/data_preprocessor.py#L20) in the DataPreprocessor.
+
+Create a new configuration under the `configs/yolov5` path named `configs/yolov5/yolov5_s-v61_fast_1xb12-ms-40e_cat.py` with the following contents.
+
+```python
+_base_ = 'yolov5_s-v61_fast_1xb12-40e_cat.py'
+
+model = dict(
+    data_preprocessor=dict(
+        type='YOLOv5DetDataPreprocessor',
+        pad_size_divisor=32,
+        batch_augments=[
+            dict(
+                type='YOLOXBatchSyncRandomResize',
+                # multi-scale range (480, 800)
+                random_size_range=(480, 800),
+                # The output scale needs to be divisible by 32
+                size_divisor=32,
+                interval=1)
+        ])
+)
+```
+
+The above configuration will enable multi-scale training. We have already provided this configuration under `configs/yolov5/` for convenience. The rest of the YOLO family of algorithms are similar.
+
+## Multi-scale testing
+
+MMYOLO multi-scale testing is equivalent to Test-Time Enhancement TTA and is currently supported, see [Test-Time Augmentation TTA](./tta.md).
diff --git a/docs/en/index.rst b/docs/en/index.rst
@@ -38,6 +38,7 @@ You can switch between Chinese and English documents in the top-right corner of
    common_usage/resume_training.md
    common_usage/syncbn.md
    common_usage/amp_training.md
+   common_usage/ms_training_testing.md
    common_usage/tta.md
    common_usage/plugins.md
    common_usage/freeze_layers.md

diff --git a/docs/zh_cn/common_usage/ms_training_testing.md b/docs/zh_cn/common_usage/ms_training_testing.md
@@ -0,0 +1,41 @@
+# 多尺度训练和测试
+
+## 多尺度训练
+
+MMYOLO 中目前支持了主流的 YOLOv5、YOLOv6、YOLOv7、YOLOv8 和 RTMDet 等算法，其默认配置均为单尺度 640x640 训练。 在 MM 系列开源库中常用的多尺度训练有两种实现方式：
+
+1. 在 `train_pipeline` 中输出的每张图都是不定尺度的，然后在 [DataPreprocessor](https://github.com/open-mmlab/mmdetection/blob/3.x/mmdet/models/data_preprocessors/data_preprocessor.py) 中将不同尺度的输入图片
+   通过 [stack_batch](https://github.com/open-mmlab/mmengine/blob/dbae83c52fa54d6dda08b6692b124217fe3b2135/mmengine/model/base_model/data_preprocessor.py#L260-L261) 函数填充到同一尺度，从而组成 batch 进行训练。MMDet 中大部分算法都是采用这个实现方式。
+2. 在 `train_pipeline` 中输出的每张图都是固定尺度的，然后直接在 `DataPreprocessor` 中进行 batch 张图片的上下采样，从而实现多尺度训练功能
+
+在 MMYOLO 中两种多尺度训练方式都是支持的。理论上第一种实现方式所生成的尺度会更加丰富，但是由于其对单张图进行独立增强，训练效率不如第二种方式。所以我们更推荐使用第二种方式。
+
+以 `configs/yolov5/yolov5_s-v61_fast_1xb12-40e_cat.py` 配置为例，其默认配置采用的是 640x640 固定尺度训练，假设想实现以 32 为倍数，且多尺度范围为 (480, 800) 的训练方式，则可以参考 YOLOX 做法通过 DataPreprocessor 中的 [YOLOXBatchSyncRandomResize](https://github.com/open-mmlab/mmyolo/blob/dc85144fab20a970341550794857a2f2f9b11564/mmyolo/models/data_preprocessors/data_preprocessor.py#L20) 实现。
+
+在 `configs/yolov5` 路径下新建配置，命名为 `configs/yolov5/yolov5_s-v61_fast_1xb12-ms-40e_cat.py`，其内容如下：
+
+```python
+_base_ = 'yolov5_s-v61_fast_1xb12-40e_cat.py'
+
+model = dict(
+    data_preprocessor=dict(
+        type='YOLOv5DetDataPreprocessor',
+        pad_size_divisor=32,
+        batch_augments=[
+            dict(
+                type='YOLOXBatchSyncRandomResize',
+                # 多尺度范围是 480~800
+                random_size_range=(480, 800),
+                # 输出尺度需要被 32 整除
+                size_divisor=32,
+                # 每隔 1 个迭代改变一次输出输出
+                interval=1)
+        ])
+)
+```
+
+上述配置就可以实现多尺度训练了。为了方便，我们已经在 `configs/yolov5/` 下已经提供了该配置。其余 YOLO 系列算法也是类似做法。
+
+## 多尺度测试
+
+MMYOLO 多尺度测试功能等同于测试时增强 TTA，目前已经支持，详情请查看 [测试时增强 TTA](./tta.md) 。
diff --git a/docs/zh_cn/index.rst b/docs/zh_cn/index.rst
@@ -38,6 +38,7 @@
    common_usage/resume_training.md
    common_usage/syncbn.md
    common_usage/amp_training.md
+   common_usage/ms_training_testing.md
    common_usage/tta.md
    common_usage/plugins.md
    common_usage/freeze_layers.md