Replies: 1 comment 4 replies
-
Hey there! I'm here to help you out with any bugs, questions, or contributions you have in mind. Let's tackle this issue together while we wait for a human maintainer to join in. The process crashes when To handle multiple concurrent requests correctly, you should ensure that each request is isolated and does not interfere with others. One way to achieve this is by creating a new instance of Here is an example of how you can handle multiple concurrent requests by creating separate instances of import { ChatLlamaCpp } from "@langchain/community/chat_models/llama_cpp";
import { ChatPromptTemplate } from '@langchain/core/prompts';
const modelPath = '/Users/donggi/Library/Application Support/amigo/models/_Meta-Llama-3.1-8B-Instruct-Q4_K_S.gguf';
async function handleConcurrentRequests(requests) {
const results = await Promise.all(
requests.map(async (request) => {
const model = new ChatLlamaCpp({ modelPath, temperature: 0.5 });
const template = ChatPromptTemplate.fromMessages([
['system', 'answer the question.'],
['human', '{input}'],
]);
const prompt = await template.invoke({ input: request });
const stream = await model.stream(prompt);
const chunks = [];
for await (const chunk of stream) {
chunks.push(chunk.content);
}
return chunks;
})
);
return results;
}
// Example usage
const requests = [
'hi. what is your name?',
'Tell me a short story about a happy Llama.'
];
handleConcurrentRequests(requests).then((results) => {
console.log(results);
}); This code creates a new |
Beta Was this translation helpful? Give feedback.
-
Checked other resources
Commit to Help
Example Code
Error Message and Stack Trace (if applicable)
Description
I want to run the Gemma model using
node-llama-cpp
throughlangchain.js
(ChatLlamaCpp
) and provide the model'sstream()
method as an API to clients. It should handle multiple concurrent requests.I expect that even when multiple stream() calls are made in parallel on a single ChatLlamaCpp instance, the stream will work correctly.
A single
stream()
call works fine, but as soon as two stream() calls are made simultaneously, the process crashes.System Info
dependncis
platform : mac, 14.3.1, m1 max(32gb)
node: v18.18.0
yarn : v1.22.19
Beta Was this translation helpful? Give feedback.
All reactions