How to use Gradio with GGUF model? #776

AndyZocker · 2025-01-21T18:44:56Z

I was able to install everything successfully on Windows. But I can't load a GGUF model. I entered "openbmb/MiniCPM-o-2_6-gguf" in the model_server.py, but I get an error that no config.json was found. I'm really only interested in real time voice chat, but I don't think the big standard model without gguf will run on my RTX 3060 with 12gb. Does something have to be changed in the code or how do you get GGUF models to work with the gradio that was ordered? The videos also show that it even runs on an iPad and that certainly doesn't use the large model, right? Thanks in advance for any help

YuzaChongyi · 2025-01-22T04:41:17Z

You can try this int4 version, and you only need to replace the model initialization to AutoGPTQForCausalLM.from_quantized in the model_server.py,

AndyZocker · 2025-01-25T17:48:25Z

I still don't understand which code i need to change in the model_server.py....is there a tutorial for dummys? and i also keep getting a error for flash attention which i did install after finally finding a version which work on my computer

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to use Gradio with GGUF model? #776

How to use Gradio with GGUF model? #776

AndyZocker commented Jan 21, 2025

YuzaChongyi commented Jan 22, 2025

AndyZocker commented Jan 25, 2025

How to use Gradio with GGUF model? #776

How to use Gradio with GGUF model? #776

Comments

AndyZocker commented Jan 21, 2025

YuzaChongyi commented Jan 22, 2025

AndyZocker commented Jan 25, 2025