Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: llama3.1 8B Context Size Max Tokens Ignored in Both Performance Modes #2442

Open
rurhrlaub opened this issue Oct 8, 2024 · 3 comments
Labels
needs info / can't replicate Issues that require additional information and/or cannot currently be replicated, but possible bug possible bug Bug was reported but is not confirmed or is unable to be replicated.

Comments

@rurhrlaub
Copy link

How are you running AnythingLLM?

AnythingLLM desktop app

What happened?

anythingfllm_context

When using "Base" as the "Performance Mode", the Max Tokens setting is ignored and Llama 3.1 is invoked with 8K context size. When setting Performance Mode to "Maximum", the Max Tokens settings is ignored and Llama 3.1 is invoked with 128K context size. Created a modelfile to enforce 32K context size but the result was 128K. Workspace was set to use the system defined LLM settings.

Are there known steps to reproduce?

See above

@rurhrlaub rurhrlaub added the possible bug Bug was reported but is not confirmed or is unable to be replicated. label Oct 8, 2024
@rurhrlaub
Copy link
Author

Anything LLM v1.6.7

@timothycarambat
Copy link
Member

When using "Base" as the "Performance Mode", the Max Tokens setting is ignored and Llama 3.1 is invoked with 8K context size

This is normal and expected. See this from Ollama: ollama/ollama#1005 (comment)
And you can see what we did in this conversation: #1991

When setting Performance Mode to "Maximum", the Max Tokens settings is ignored and Llama 3.1 is invoked with 128K context size. Created a modelfile to enforce 32K context size but the result was 128K

Any parameters passed into the API will override whatever is in a Modelfile in Ollama:
https://github.com/Mintplex-Labs/anything-llm/pull/2014/files#diff-df0e7523cd11db44d61e29cfb54f0bdc2ace72ffcf18abeca888d299efd2d738R37-R40

So here, we would be passing in whatever value you have for Max Tokens in the UI. How do you see 128K and where are you seeing that?

@timothycarambat timothycarambat added the needs info / can't replicate Issues that require additional information and/or cannot currently be replicated, but possible bug label Oct 9, 2024
@rurhrlaub
Copy link
Author

Looking at the --ctx-size parameter in the shell - it is always 8K or 128K, never 32K. 128K setting is too large to execute 8K always truncates the data in context for incomplete results:

814339519 42877 42374 4004 0 31 0 35095332 14060 - S 0 ?? 0:01.61 /Applications/AnythingLLM.app/Contents/Resources/ollama/llm serve
814339519 63122 42877 4004 0 31 0 40038304 5776448 - S 0 ?? 17:02.59 /var/folders/p2/4xbgs9lx7lvdxsffq0v123h0r8lndz/T/ollama4011639316/runners/cpu_avx2/ollama_llama_server --model /Users/ruhrlaub/Library/Application Support/anythingllm-desktop/storage/models/ollama/blobs/sha256-87048bcd55216712ef14c11c2c303728463207b165bf18440b9b84b07ec00f87 --ctx-size 8192 --batch-size 512 --embedding --log-disable --no-mmap --mlock --parallel 4 --port 57041

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs info / can't replicate Issues that require additional information and/or cannot currently be replicated, but possible bug possible bug Bug was reported but is not confirmed or is unable to be replicated.
Projects
None yet
Development

No branches or pull requests

2 participants