Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

After a while, it takes a while for responses to come through. #535

Open
stereomato opened this issue Feb 8, 2025 · 2 comments
Open

After a while, it takes a while for responses to come through. #535

stereomato opened this issue Feb 8, 2025 · 2 comments
Labels
bug Something isn't working

Comments

@stereomato
Copy link

Describe the bug
As title says, I have this problem. I'm not sure what causes it. I see that there's 4 ollama processes using around 100% (according to htop) after I send a message. But in the beginning answers were instant. I checked ram usage, and it seems to be constant 6.27 GB.

Expected behavior
Fast responses.

Screenshots
If applicable, add screenshots to help explain your problem.

Debugging information

alpaca troubleshooting.txt

@stereomato stereomato added the bug Something isn't working label Feb 8, 2025
@Pingasmaster
Copy link

Pingasmaster commented Feb 15, 2025

I see that you switched between the deepseek2 and llama models. I'm not an alpaca dev but I'm familiar with ollama's codebase. If you launch multiple generations at a time before finishing the previous one, it both will be very slow. The waiting times are because your RAM is slow between models swap from llama to deepseek2, or because your model was removed (or partially removed) from RAM and needs to be charged again from storage which happens if you stop using ollama for a certain time or stop the process. Additionlly, in a conversation, if you do not stop the conversation and make a new chat, all old chat messages must be processed by the model before he can process your newest message which makes lots of loading times and makes the first token (word of the AI) "slow" to appear. Tldr: make new chats for each separate message, and it take time to charge models in RAM when switching between models. Nothing is a bug. There are no error in the logs you shared.
The six time=2025-02-08T15:36:09.733-05:00 level=INFO source=server.go:589 msg="waiting for server to become available" status="llm server error" lines refer to when you instanciated more than one conversation at the same time and its not really an error, it just has to wait for the ollama server to process the first request that you made before processing another one. Also for the high CPU usage, you just have a 4-core processor, and ollama uses 1 process per core for multithreading. Its perfectly normal.

@mags0ft
Copy link
Contributor

mags0ft commented Feb 21, 2025

Are there other users who can provide their experiences with this? It'd be great to get some more responses so it's easier to assess the importance of this issue.

For me personally, I've also experienced a smilar behavior but do firmly believe this is due to Ollama, not Alpaca, like @Pingasmaster rightfully said. Still some more info from a few other people would be helpful just to be sure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants