Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Chore]: bump node-llama-cpp to 3.1.1 to support newer models #2446

Open
PeterTucker opened this issue Oct 9, 2024 · 1 comment
Open

[Chore]: bump node-llama-cpp to 3.1.1 to support newer models #2446

PeterTucker opened this issue Oct 9, 2024 · 1 comment
Assignees
Labels
enhancement New feature or request feature request

Comments

@PeterTucker
Copy link

How are you running AnythingLLM?

Docker (local)

What happened?

Docker sees my models. I start chatting in my workspace, and then I get an error "Failed to load model"

anythingllm  | llama_model_loader: loaded meta data with 36 key-value pairs and 256 tensors from /app/server/storage/models/downloaded/Llama-3.2-3B-Instruct-uncensored.Q8_0.gguf (version GGUF V3 (latest))
anythingllm  | llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
anythingllm  | llama_model_loader: - kv   0:                       general.architecture str              = llama
anythingllm  | llama_model_loader: - kv   1:                               general.type str              = model
anythingllm  | llama_model_loader: - kv   2:                               general.name str              = Llama 3.2 3B Instruct
anythingllm  | llama_model_loader: - kv   3:                       general.organization str              = Meta Llama
anythingllm  | llama_model_loader: - kv   4:                           general.finetune str              = Instruct
anythingllm  | llama_model_loader: - kv   5:                           general.basename str              = Llama-3.2
anythingllm  | llama_model_loader: - kv   6:                         general.size_label str              = 3B
anythingllm  | llama_model_loader: - kv   7:                          llama.block_count u32              = 28
anythingllm  | llama_model_loader: - kv   8:                       llama.context_length u32              = 131072
anythingllm  | llama_model_loader: - kv   9:                     llama.embedding_length u32              = 3072
anythingllm  | llama_model_loader: - kv  10:                  llama.feed_forward_length u32              = 8192
anythingllm  | llama_model_loader: - kv  11:                 llama.attention.head_count u32              = 24
anythingllm  | llama_model_loader: - kv  12:              llama.attention.head_count_kv u32              = 8
anythingllm  | llama_model_loader: - kv  13:                       llama.rope.freq_base f32              = 500000.000000
anythingllm  | llama_model_loader: - kv  14:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
anythingllm  | llama_model_loader: - kv  15:                 llama.attention.key_length u32              = 128
anythingllm  | llama_model_loader: - kv  16:               llama.attention.value_length u32              = 128
anythingllm  | llama_model_loader: - kv  17:                          general.file_type u32              = 7
anythingllm  | llama_model_loader: - kv  18:                           llama.vocab_size u32              = 128256
anythingllm  | llama_model_loader: - kv  19:                 llama.rope.dimension_count u32              = 128
anythingllm  | llama_model_loader: - kv  20:                       tokenizer.ggml.model str              = gpt2
anythingllm  | llama_model_loader: - kv  21:                         tokenizer.ggml.pre str              = llama-bpe
anythingllm  | llama_model_loader: - kv  22:                      tokenizer.ggml.tokens arr[str,128256]  = ["!", "\"", "#", "$", "%", "&", "'", ...
anythingllm  | llama_model_loader: - kv  23:                  tokenizer.ggml.token_type arr[i32,128256]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
anythingllm  | llama_model_loader: - kv  24:                      tokenizer.ggml.merges arr[str,280147]  = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "...
anythingllm  | llama_model_loader: - kv  25:                tokenizer.ggml.bos_token_id u32              = 128000
anythingllm  | llama_model_loader: - kv  26:                tokenizer.ggml.eos_token_id u32              = 128009
anythingllm  | llama_model_loader: - kv  27:                    tokenizer.chat_template str              = {{- bos_token }}\n{%- if custom_tools ...
anythingllm  | llama_model_loader: - kv  28:               general.quantization_version u32              = 2
anythingllm  | llama_model_loader: - kv  29:                                general.url str              = https://huggingface.co/mradermacher/L...
anythingllm  | llama_model_loader: - kv  30:              mradermacher.quantize_version str              = 2
anythingllm  | llama_model_loader: - kv  31:                  mradermacher.quantized_by str              = mradermacher
anythingllm  | llama_model_loader: - kv  32:                  mradermacher.quantized_at str              = 2024-09-28T01:03:04+02:00
anythingllm  | llama_model_loader: - kv  33:                  mradermacher.quantized_on str              = db3
anythingllm  | llama_model_loader: - kv  34:                         general.source.url str              = https://huggingface.co/chuanli11/Llam...
anythingllm  | llama_model_loader: - kv  35:                  mradermacher.convert_type str              = hf
anythingllm  | llama_model_loader: - type  f32:   58 tensors
anythingllm  | llama_model_loader: - type q8_0:  198 tensors
anythingllm  | llm_load_vocab: special tokens definition check successful ( 256/128256 ).
anythingllm  | llm_load_print_meta: format           = GGUF V3 (latest)
anythingllm  | llm_load_print_meta: arch             = llama
anythingllm  | llm_load_print_meta: vocab type       = BPE
anythingllm  | llm_load_print_meta: n_vocab          = 128256
anythingllm  | llm_load_print_meta: n_merges         = 280147
anythingllm  | llm_load_print_meta: n_ctx_train      = 131072
anythingllm  | llm_load_print_meta: n_embd           = 3072
anythingllm  | llm_load_print_meta: n_head           = 24
anythingllm  | llm_load_print_meta: n_head_kv        = 8
anythingllm  | llm_load_print_meta: n_layer          = 28
anythingllm  | llm_load_print_meta: n_rot            = 128
anythingllm  | llm_load_print_meta: n_embd_head_k    = 128
anythingllm  | llm_load_print_meta: n_embd_head_v    = 128
anythingllm  | llm_load_print_meta: n_gqa            = 3
anythingllm  | llm_load_print_meta: n_embd_k_gqa     = 1024
anythingllm  | llm_load_print_meta: n_embd_v_gqa     = 1024
anythingllm  | llm_load_print_meta: f_norm_eps       = 0.0e+00
anythingllm  | llm_load_print_meta: f_norm_rms_eps   = 1.0e-05
anythingllm  | llm_load_print_meta: f_clamp_kqv      = 0.0e+00
anythingllm  | llm_load_print_meta: f_max_alibi_bias = 0.0e+00
anythingllm  | llm_load_print_meta: f_logit_scale    = 0.0e+00
anythingllm  | llm_load_print_meta: n_ff             = 8192
anythingllm  | llm_load_print_meta: n_expert         = 0
anythingllm  | llm_load_print_meta: n_expert_used    = 0
anythingllm  | llm_load_print_meta: causal attn      = 1
anythingllm  | llm_load_print_meta: pooling type     = 0
anythingllm  | llm_load_print_meta: rope type        = 0
anythingllm  | llm_load_print_meta: rope scaling     = linear
anythingllm  | llm_load_print_meta: freq_base_train  = 500000.0
anythingllm  | llm_load_print_meta: freq_scale_train = 1
anythingllm  | llm_load_print_meta: n_yarn_orig_ctx  = 131072
anythingllm  | llm_load_print_meta: rope_finetuned   = unknown
anythingllm  | llm_load_print_meta: ssm_d_conv       = 0
anythingllm  | llm_load_print_meta: ssm_d_inner      = 0
anythingllm  | llm_load_print_meta: ssm_d_state      = 0
anythingllm  | llm_load_print_meta: ssm_dt_rank      = 0
anythingllm  | llm_load_print_meta: model type       = ?B
anythingllm  | llm_load_print_meta: model ftype      = Q8_0
anythingllm  | llm_load_print_meta: model params     = 3.61 B
anythingllm  | llm_load_print_meta: model size       = 3.57 GiB (8.50 BPW)
anythingllm  | llm_load_print_meta: general.name     = Llama 3.2 3B Instruct
anythingllm  | llm_load_print_meta: BOS token        = 128000 '<|begin_of_text|>'
anythingllm  | llm_load_print_meta: EOS token        = 128009 '<|eot_id|>'
anythingllm  | llm_load_print_meta: LF token         = 128 'Ä'
anythingllm  | llm_load_print_meta: EOT token        = 128009 '<|eot_id|>'
anythingllm  | llm_load_tensors: ggml ctx size =    0.13 MiB
anythingllm  | llama_model_load: error loading model: done_getting_tensors: wrong number of tensors; expected 256, got 255
anythingllm  | llama_load_model_from_file: failed to load model
anythingllm  | [backend] error: Error: Failed to load model
anythingllm  |     at new LlamaModel (file:///app/server/node_modules/node-llama-cpp/dist/llamaEvaluator/LlamaModel.js:50:23)
anythingllm  |     at createLlamaModel (file:///app/server/node_modules/@langchain/community/dist/utils/llama_cpp.js:10:12)
anythingllm  |     at new ChatLlamaCpp (file:///app/server/node_modules/@langchain/community/dist/chat_models/llama_cpp.js:94:23)
anythingllm  |     at /app/server/utils/AiProviders/native/index.js:12:27
anythingllm  |     at async #initializeLlamaModel (/app/server/utils/AiProviders/native/index.js:45:33)
anythingllm  |     at async #llamaClient (/app/server/utils/AiProviders/native/index.js:57:5)
anythingllm  |     at async NativeLLM.streamGetChatCompletion (/app/server/utils/AiProviders/native/index.js:134:19)
anythingllm  |     at async streamChatWithWorkspace (/app/server/utils/chats/stream.js:241:20)
anythingllm  |     at async /app/server/endpoints/chat.js:86:9

Are there known steps to reproduce?

No response

@PeterTucker PeterTucker added the possible bug Bug was reported but is not confirmed or is unable to be replicated. label Oct 9, 2024
@PeterTucker
Copy link
Author

PeterTucker commented Oct 9, 2024

Looks like the issue is that node-llama-cpp needs to be updated and integrated from "^2.8.0" to "3.1.1" in order for model "Llama 3.2 3B" to work. I tried updating, but it looks like there are breaking changes between the two node-llama-cpp versions.

When the model is being initialized:

anythingllm  | [backend] error: TypeError: Cannot destructure property '_llama' of 'undefined' as it is undefined.
anythingllm  |     at new LlamaModel (file:///app/server/node_modules/node-llama-cpp/dist/evaluator/LlamaModel/LlamaModel.js:42:144)
anythingllm  |     at createLlamaModel (file:///app/server/node_modules/@langchain/community/dist/utils/llama_cpp.js:10:12)
anythingllm  |     at new ChatLlamaCpp (file:///app/server/node_modules/@langchain/community/dist/chat_models/llama_cpp.js:94:23)
anythingllm  |     at /app/server/utils/AiProviders/native/index.js:12:27
anythingllm  |     at async #initializeLlamaModel (/app/server/utils/AiProviders/native/index.js:45:33)
anythingllm  |     at async #llamaClient (/app/server/utils/AiProviders/native/index.js:57:5)
anythingllm  |     at async NativeLLM.streamGetChatCompletion (/app/server/utils/AiProviders/native/index.js:134:19)
anythingllm  |     at async streamChatWithWorkspace (/app/server/utils/chats/stream.js:241:20)
anythingllm  |     at async /app/server/endpoints/chat.js:86:9

@timothycarambat timothycarambat changed the title [BUG]: Can't run Native model in Docker [Chore]: bump node-llama-cpp to 3.1.1 to support newer models Oct 9, 2024
@timothycarambat timothycarambat added enhancement New feature or request feature request and removed possible bug Bug was reported but is not confirmed or is unable to be replicated. labels Oct 9, 2024
@timothycarambat timothycarambat self-assigned this Oct 9, 2024
@Mintplex-Labs Mintplex-Labs deleted a comment from coldheartai Oct 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request feature request
Projects
None yet
Development

No branches or pull requests

2 participants