Add the support for deepseek architecture .gguf #36144

zh-jp · 2025-02-12T02:10:28Z

Feature request

The current version does not support gguf under the deepseek architecture. It is hoped that the deepseek architecture will be added. [supported-model-architectures]

Motivation

In some framework based transformers (e.g. vllm) will raise error when load .gguf file of deepseek model or quantized deepseek model.

[Usage]: Does DeepSeek-R1 1.58-bit Dynamic Quant work on VLLM? · Issue #12573 · vllm-project/vllm
unsloth/DeepSeek-R1-GGUF · Running the model with vLLM does not actually work

Your contribution

Is there any guidance to help users add relevant support?

Rocketknight1 · 2025-02-12T13:30:10Z

cc @SunMarc @muellerzr @MekkCyber - who's the right person to ping for GGUF loading?

MekkCyber · 2025-02-12T13:32:53Z

I think me and @SunMarc and @Isotr0py

Isotr0py · 2025-02-12T13:35:55Z

Seems that all deepseek-r1 gguf checkpoints are sharded, I think we should add sharded gguf support firstly.

Isotr0py · 2025-02-12T13:41:12Z

Anyway, regardless the sharded gguf weights (we can merge weights with tool from llama.cpp), will #35926 block us currently? Because the deepseek-v3 support hasn't landed yet.

MekkCyber · 2025-02-12T13:51:56Z

yes deepseek v3 is still not supported for now, the pr is functional but some small adjustments are needed

zh-jp · 2025-02-13T04:51:57Z

Hello @MekkCyber ! Isn't the reason that the transformers don't support it because the deepseek .gguf file can't be merged?

MekkCyber · 2025-02-13T08:36:41Z

Hello @zh-jp! I think it can be merged using llama-gguf-split --merge but i'm not sure if it's a good idea

mmdbhs · 2025-02-14T06:57:42Z