At least one NVIDIA 20-series or higher GPU with more than 12GB of VRAM is required.
git clone https://huggingface.co/openbmb/MiniCPM3-4B-GPTQ-Int4
-
Acquire MiniCPM Model Weights
git clone https://huggingface.co/openbmb/MiniCPM3-4B
-
Acquire Quantization Script
git clone https://github.com/OpenBMB/MiniCPM
-
Install the AutoGPTQ Branch Here, you will get the code from my forked branch. (A PR has been submitted)
git clone https://github.com/LDLINGLINGLING/AutoGPTQ.git cd AutoGPTQ git checkout minicpm3 pip install -e .
-
Start Quantization
cd MiniCPM/quantize # In the following command, modify no_quant_model_path to the location where the MiniCPM3 weights are saved, and quant_save_path to the directory where the quantized model will be saved. python gptq_quantize.py --pretrained_model_dir no_quant_model_path --quantized_model_dir quant_save_path --bits 4