[BUG]: llama3.1 8B Context Size Max Tokens Ignored in Both Performance Modes #2442

rurhrlaub · 2024-10-08T13:57:27Z

How are you running AnythingLLM?

AnythingLLM desktop app

What happened?

When using "Base" as the "Performance Mode", the Max Tokens setting is ignored and Llama 3.1 is invoked with 8K context size. When setting Performance Mode to "Maximum", the Max Tokens settings is ignored and Llama 3.1 is invoked with 128K context size. Created a modelfile to enforce 32K context size but the result was 128K. Workspace was set to use the system defined LLM settings.

Are there known steps to reproduce?

See above

rurhrlaub · 2024-10-08T14:02:17Z

Anything LLM v1.6.7

timothycarambat · 2024-10-09T17:24:20Z

When using "Base" as the "Performance Mode", the Max Tokens setting is ignored and Llama 3.1 is invoked with 8K context size

This is normal and expected. See this from Ollama: ollama/ollama#1005 (comment)
And you can see what we did in this conversation: #1991

When setting Performance Mode to "Maximum", the Max Tokens settings is ignored and Llama 3.1 is invoked with 128K context size. Created a modelfile to enforce 32K context size but the result was 128K

Any parameters passed into the API will override whatever is in a Modelfile in Ollama:
https://github.com/Mintplex-Labs/anything-llm/pull/2014/files#diff-df0e7523cd11db44d61e29cfb54f0bdc2ace72ffcf18abeca888d299efd2d738R37-R40

So here, we would be passing in whatever value you have for Max Tokens in the UI. How do you see 128K and where are you seeing that?

rurhrlaub · 2024-10-09T17:31:37Z

Looking at the --ctx-size parameter in the shell - it is always 8K or 128K, never 32K. 128K setting is too large to execute 8K always truncates the data in context for incomplete results:

814339519 42877 42374 4004 0 31 0 35095332 14060 - S 0 ?? 0:01.61 /Applications/AnythingLLM.app/Contents/Resources/ollama/llm serve
814339519 63122 42877 4004 0 31 0 40038304 5776448 - S 0 ?? 17:02.59 /var/folders/p2/4xbgs9lx7lvdxsffq0v123h0r8lndz/T/ollama4011639316/runners/cpu_avx2/ollama_llama_server --model /Users/ruhrlaub/Library/Application Support/anythingllm-desktop/storage/models/ollama/blobs/sha256-87048bcd55216712ef14c11c2c303728463207b165bf18440b9b84b07ec00f87 --ctx-size 8192 --batch-size 512 --embedding --log-disable --no-mmap --mlock --parallel 4 --port 57041

rurhrlaub added the possible bug Bug was reported but is not confirmed or is unable to be replicated. label Oct 8, 2024

timothycarambat added the needs info / can't replicate Issues that require additional information and/or cannot currently be replicated, but possible bug label Oct 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG]: llama3.1 8B Context Size Max Tokens Ignored in Both Performance Modes #2442

[BUG]: llama3.1 8B Context Size Max Tokens Ignored in Both Performance Modes #2442

rurhrlaub commented Oct 8, 2024

rurhrlaub commented Oct 8, 2024

timothycarambat commented Oct 9, 2024

rurhrlaub commented Oct 9, 2024

[BUG]: llama3.1 8B Context Size Max Tokens Ignored in Both Performance Modes #2442

[BUG]: llama3.1 8B Context Size Max Tokens Ignored in Both Performance Modes #2442

Comments

rurhrlaub commented Oct 8, 2024

How are you running AnythingLLM?

What happened?

Are there known steps to reproduce?

rurhrlaub commented Oct 8, 2024

timothycarambat commented Oct 9, 2024

rurhrlaub commented Oct 9, 2024