[doc] qconfig layerwise modify (#342)

* [doc]set different qconfig for different module * [doc]fix typo * [doc]fix typo
alibaba · Jul 15, 2024 · 3423a9c · 3423a9c
1 parent 9876621
commit 3423a9c
Show file tree

Hide file tree

Showing 2 changed files with 77 additions and 0 deletions.
diff --git a/docs/FAQ.md b/docs/FAQ.md
@@ -75,6 +75,45 @@ with model_tracer():
 ```
 
 
+#### How to set a more flexible Qconfig?
+Q: How to set different quantization configurations, such as specifying different quantization observers for different layers?
+
+A: Configure the `override_qconfig_func` parameter in the config during `Quantizer` initialization. This requires the user to define a function that modifies the Qconfig for the corresponding Op. Below is an example to set MinMaxObservers based on different module name or module type. More `FakeQuantize` and `Observer` implementations can be selected from the official `torch.quantization` library, or you can [customize your own implementations](../tinynn/graph/quantization/fake_quantize.py).
+
+module_name can be obtained from the generated traced model definition in out/Qxx.py.
+
+```python
+import torch
+from torch.quantization import FakeQuantize, MinMaxObserver
+form torch.ao.nn.intrinsic import ConvBnReLU2d
+def set_MinMaxObserver(name, module):
+   # Set the corresponding weight and activation observers to MinMaxObserver based on model_name and module_type.
+   if name in ['model_0_0', 'model_0_1'] or isinstance(module, ConvBnReLU2d):
+        weight_fq = FakeQuantize.with_args(
+            observer=MinMaxObserver,
+            quant_min=-128,
+            quant_max=127,
+            dtype=torch.qint8,
+            qscheme=torch.per_tensor_symmetric,
+            reduce_range=False,
+        )
+        act_fq = FakeQuantize.with_args(
+            observer=MinMaxObserver,
+            quant_min=0,
+            quant_max=255,
+            dtype=torch.quint8,
+            reduce_range=False,
+        )
+        qconfig_new = torch.quantization.QConfig(act_fq, weight_fq)
+        return qconfig_new
+```
+```python
+with model_tracer():
+    quantizer = QATQuantizer(model, dummy_input, work_dir='out', config={'override_qconfig_func': set_MinMaxObserver})
+    qat_model = quantizer.quantize()
+```
+
+
 #### How to handle the case of inconsistent training and inference computation graphs?
 
 Q: Models may have some extra logic in the training phase that are not needed in inference, such as the model below (which is also a common scenario in real world OCR and face recognition).

diff --git a/docs/FAQ_zh-CN.md b/docs/FAQ_zh-CN.md
@@ -73,6 +73,44 @@ with model_tracer():
     qat_model = quantizer.quantize()
 ```
 
+#### 如何配置更加灵活的Qconfig？
+Q: 如何在设置不同的量化配置，例如为不同的层指定不同的量化Observer？
+
+A: 在`Quantizer`初始化时配置config中的`override_qconfig_func`参数，自定义一个函数用于修改对应算子的Qconfig，以下是按照不同的module name或module type设定MinMaxObserver的方式。更多的`FakeQuantize`和`Observer`可以从`torch.quantization`官方实现中进行选取，或者[自定义相关实现](../tinynn/graph/quantization/fake_quantize.py)。
+
+module_name 可以从生成的out/Qxx.py模型定义中获知。
+
+```python
+import torch
+from torch.quantization import FakeQuantize, MinMaxObserver
+form torch.ao.nn.intrinsic import ConvBnReLU2d
+def set_ptq_fake_quantize_1(name, module):
+   # 按照model_name和module_type 将对应weight和激活值的OBserver设置为MinMaxObserver。
+   if name in ['model_0_0', 'model_0_1'] or isinstance(module, ConvBnReLU2d):
+        weight_fq = FakeQuantize.with_args(
+            observer=MinMaxObserver,
+            quant_min=-128,
+            quant_max=127,
+            dtype=torch.qint8,
+            qscheme=torch.per_tensor_symmetric,
+            reduce_range=False,
+        )
+        act_fq = FakeQuantize.with_args(
+            observer=MinMaxObserver,
+            quant_min=0,
+            quant_max=255,
+            dtype=torch.quint8,
+            reduce_range=False,
+        )
+        qconfig_new = torch.quantization.QConfig(act_fq, weight_fq)
+        return qconfig_new
+```
+```python
+with model_tracer():
+    quantizer = QATQuantizer(model, dummy_input, work_dir='out', config={'override_qconfig_func': set_MinMaxObserver})
+    qat_model = quantizer.quantize()
+```
+
 
 #### 如何处理训练和推理计算图不一致的情况？