You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
As title says, I have this problem. I'm not sure what causes it. I see that there's 4 ollama processes using around 100% (according to htop) after I send a message. But in the beginning answers were instant. I checked ram usage, and it seems to be constant 6.27 GB.
Expected behavior
Fast responses.
Screenshots
If applicable, add screenshots to help explain your problem.
I see that you switched between the deepseek2 and llama models. I'm not an alpaca dev but I'm familiar with ollama's codebase. If you launch multiple generations at a time before finishing the previous one, it both will be very slow. The waiting times are because your RAM is slow between models swap from llama to deepseek2, or because your model was removed (or partially removed) from RAM and needs to be charged again from storage which happens if you stop using ollama for a certain time or stop the process. Additionlly, in a conversation, if you do not stop the conversation and make a new chat, all old chat messages must be processed by the model before he can process your newest message which makes lots of loading times and makes the first token (word of the AI) "slow" to appear. Tldr: make new chats for each separate message, and it take time to charge models in RAM when switching between models. Nothing is a bug. There are no error in the logs you shared.
The six time=2025-02-08T15:36:09.733-05:00 level=INFO source=server.go:589 msg="waiting for server to become available" status="llm server error" lines refer to when you instanciated more than one conversation at the same time and its not really an error, it just has to wait for the ollama server to process the first request that you made before processing another one. Also for the high CPU usage, you just have a 4-core processor, and ollama uses 1 process per core for multithreading. Its perfectly normal.
Are there other users who can provide their experiences with this? It'd be great to get some more responses so it's easier to assess the importance of this issue.
For me personally, I've also experienced a smilar behavior but do firmly believe this is due to Ollama, not Alpaca, like @Pingasmaster rightfully said. Still some more info from a few other people would be helpful just to be sure.
Describe the bug
As title says, I have this problem. I'm not sure what causes it. I see that there's 4 ollama processes using around 100% (according to htop) after I send a message. But in the beginning answers were instant. I checked ram usage, and it seems to be constant 6.27 GB.
Expected behavior
Fast responses.
Screenshots
If applicable, add screenshots to help explain your problem.
Debugging information
alpaca troubleshooting.txt
The text was updated successfully, but these errors were encountered: