To perform quantization of the MiniCPM model, you need to follow the steps below and ensure that your device meets the following requirements:
- At least one Nvidia 20-series or higher GPU;
- 6GB of VRAM for quantizing 2-bit models;
- 4GB of VRAM for quantizing 1-bit models.
For example, using MiniCPM-2B-sft:
git clone https://huggingface.co/openbmb/MiniCPM-2B-sft-bf16
Since AutoGPTQ is no longer being updated, we will use a branch version:
git clone -b minicpm_gptq https://github.com/LDLINGLINGLING/AutoGPTQ
Navigate to the AutoGPTQ directory and install the dependencies:
cd AutoGPTQ
git checkout minicpm_autogptq
pip install -e .
Navigate to the MiniCPM quantization directory and modify the path parameters in the quantization script:
cd MiniCPM/quantize
Run the quantization script, modifying no_quant_model_path
to the path where the unquantized MiniCPM model weights are saved, and quant_save_path
to the path where the quantized model will be saved:
python gptq_quantize.py --pretrained_model_dir no_quant_model_path --quantized_model_dir quant_save_path --bits 4
After completing the above steps, you will have obtained the quantized MiniCPM model weights.