Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to use Gradio with GGUF model? #776

Open
AndyZocker opened this issue Jan 21, 2025 · 2 comments
Open

How to use Gradio with GGUF model? #776

AndyZocker opened this issue Jan 21, 2025 · 2 comments

Comments

@AndyZocker
Copy link

I was able to install everything successfully on Windows. But I can't load a GGUF model. I entered "openbmb/MiniCPM-o-2_6-gguf" in the model_server.py, but I get an error that no config.json was found. I'm really only interested in real time voice chat, but I don't think the big standard model without gguf will run on my RTX 3060 with 12gb. Does something have to be changed in the code or how do you get GGUF models to work with the gradio that was ordered? The videos also show that it even runs on an iPad and that certainly doesn't use the large model, right? Thanks in advance for any help

@YuzaChongyi
Copy link
Collaborator

You can try this int4 version, and you only need to replace the model initialization to AutoGPTQForCausalLM.from_quantized in the model_server.py,

@AndyZocker
Copy link
Author

I still don't understand which code i need to change in the model_server.py....is there a tutorial for dummys? and i also keep getting a error for flash attention which i did install after finally finding a version which work on my computer

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants