n_tokens <= n_batch Error with LlamaCpp #3362

hafsalm · 2023-11-21T16:15:15Z

hafsalm
Nov 21, 2023

Hi,

I'm using the node-llama-cpp package in my project and I've encountered an error that I'm having trouble resolving. The error message is n_tokens <= n_batch.

From what I understand, this error typically occurs when the number of tokens exceeds the batch size. However, I'm not sure how to fix this in my code. Could anyone provide some guidance on what might be causing this error and how to resolve it?

Here's my code(It's a langjs runnable sequence, works well with Ollama, but not working with LlamaCpp):

import { ChatLlamaCpp } from "langchain/chat_models/llama_cpp";
import { HNSWLib } from "langchain/vectorstores/hnswlib";
import { OpenAIEmbeddings } from "langchain/embeddings/openai";
import { RecursiveCharacterTextSplitter } from "langchain/text_splitter";
import * as fs from "fs";
import { PromptTemplate } from "langchain/prompts";
import { RunnableSequence } from "langchain/schema/runnable";
import { StringOutputParser } from "langchain/schema/output_parser";
import { formatDocumentsAsString } from "langchain/util/document";

/* Initialize the LLM to use to answer the question */
const llamaPath = "mistral-7b-instruct-v0.1.Q4_K_S.gguf"
const model = new ChatLlamaCpp({modelPath: llamaPath})
/* Load in the file we want to do question answering over */
const text = fs.readFileSync("state_of_the_union.txt", "utf8");
/* Split the text into chunks */
const textSplitter = new RecursiveCharacterTextSplitter({ chunkSize: 1000 });
const docs = await textSplitter.createDocuments([text]);
/* Create the vectorstore */
const vectorStore = await HNSWLib.fromDocuments(docs, new OpenAIEmbeddings());
const retriever = vectorStore.asRetriever();

const formatChatHistory = (
  human: string,
  ai: string,
  previousChatHistory?: string
) => {
  const newInteraction = `Human: ${human}\nAI: ${ai}`;
  if (!previousChatHistory) {
    return newInteraction;
  }
  return `${previousChatHistory}\n\n${newInteraction}`;
};

/**
 * Create a prompt template for generating an answer based on context and
 * a question.
 *
 * Chat history will be an empty string if it's the first question.
 *
 * inputVariables: ["chatHistory", "context", "question"]
 */
const questionPrompt = PromptTemplate.fromTemplate(
  `Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.
  ----------------
  CONTEXT: {context}
  ----------------
  CHAT HISTORY: {chatHistory}
  ----------------
  QUESTION: {question}
  ----------------
  Helpful Answer:`
);

const chain = RunnableSequence.from([
  {
    question: (input: { question: string; chatHistory?: string }) =>
      input.question,
    chatHistory: (input: { question: string; chatHistory?: string }) =>
      input.chatHistory ?? "",
    context: async (input: { question: string; chatHistory?: string }) => {
      const relevantDocs = await retriever.getRelevantDocuments(input.question);
      const serialized = formatDocumentsAsString(relevantDocs);
      return serialized;
    },
  },
  questionPrompt,
  model,
  new StringOutputParser(),
]);

const questionOne = "What did the president say about Justice Breyer?";

const resultOne = await chain.invoke({
  question: questionOne,
});

console.log({ resultOne });
/**
 * {
 *   resultOne: 'The president thanked Justice Breyer for his service and described him as an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court.'
 * }
 */

const resultTwo = await chain.invoke({
  chatHistory: formatChatHistory(resultOne, questionOne),
  question: "Was it nice?",
});

console.log({ resultTwo });
/**
 * {
 *   resultTwo: "Yes, the president's description of Justice Breyer was positive."
 * }
 */

@hafsalm · 2023-11-21T16:21:00Z

dosubot[bot]
bot Nov 21, 2023

🤖

Hi @hafsalm, great to see you again! I hope you're doing well.

The "n_tokens <= n_batch" error you're encountering typically occurs when the number of tokens specified for a batch exceeds the total number of tokens available. In the context of LangChainJS and the node-llama-cpp package, this could be related to the maxTokens parameter in the LlamaCppCallOptions interface and how it's used in the _call method of the LlamaCpp class.

To resolve this issue, you need to ensure that the maxTokens value you're setting does not exceed the total number of tokens available in your model. If you're unsure of the total number of tokens, you might need to check the documentation or configuration of the specific model you're using.

Here's a simple way to check and limit the maxTokens value:

async _call(
  prompt: string,
  // @ts-expect-error - TS6133: 'options' is declared but its value is never read.
  options?: this["ParsedCallOptions"]
): Promise<string> {
  try {
    const totalTokens = /* the total number of tokens in your model */;
    const maxTokens = this?.maxTokens && this?.maxTokens <= totalTokens ? this?.maxTokens : totalTokens;

    const promptOptions = {
      maxTokens,
      temperature: this?.temperature,
      topK: this?.topK,
      topP: this?.topP,
      trimWhitespaceSuffix: this?.trimWhitespaceSuffix,
    };
    const completion = await this._session.prompt(prompt, promptOptions);
    return completion;
  } catch (e) {
    throw new Error("Error getting prompt completion.");
  }
}

In this code, totalTokens should be replaced with the actual total number of tokens in your model. The maxTokens value is then checked and limited to totalTokens if it exceeds that value. This ensures that the maxTokens value used in the prompt method call does not exceed the total number of tokens available, which should prevent the "n_tokens <= n_batch" error.

Please note that the maxTokens property in the LlamaCppCallOptions interface does not have a default value specified. It is an optional property, as indicated by the question mark (?) following the property name. This means that if no value is provided for maxTokens when the interface is used, its value will be undefined.

I hope this helps! Let me know if you have any other questions.

Sources

langchain/src/llms/llama_cpp.ts

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

0 replies

romanakmath · 2023-11-28T17:10:37Z

romanakmath
Nov 28, 2023

Hi, +1. I also have this issue and the provided solution doesn't work for me.

1 reply

hafsalm Nov 29, 2023
Author

Hi, Try setting batchSize as follows:

const llm = new ChatLlamaCpp({
  modelPath: llamaPath,
  batchSize:1024
})

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

n_tokens <= n_batch Error with LlamaCpp #3362

{{title}}

Replies: 2 comments 1 reply

{{title}}

{{title}}

{{title}}

Select a reply

n_tokens <= n_batch Error with LlamaCpp #3362

hafsalm Nov 21, 2023

Replies: 2 comments · 1 reply

dosubot[bot] bot Nov 21, 2023

Sources

romanakmath Nov 28, 2023

hafsalm Nov 29, 2023 Author

hafsalm
Nov 21, 2023

Replies: 2 comments 1 reply

dosubot[bot]
bot Nov 21, 2023

romanakmath
Nov 28, 2023

hafsalm Nov 29, 2023
Author