Incremental index issue with ElasticVectorSearch #4853
Replies: 3 comments 1 reply
-
Based on the error message you provided, it seems like the issue is related to the ElasticSearch bulk delete operation. The error The {
"num_added": 0,
"num_deleted": 0,
"num_skipped": 0,
"num_updated": 0,
} This inference is based on the consistent pattern across tests where no modifications to the document set result in all counts ( I found a similar solved discussion You mentioned that you're using LangChain version 0.1.12. I wasn't able to find specific changes made to the If the issue still persists after updating, it would be helpful to have more information about your setup. Specifically, how you're initializing the vector store and the exact parameters you're passing to the Sources
|
Beta Was this translation helpful? Give feedback.
-
@dosu-bot the code snippet provided gives you the whole scenario to be run. you can add this code const indexName = 'myindex';
const exists = await client.indices.exists({ index: indexName });
if (!exists) {
console.warn(`index ${indexName} does not exist. creating...`);
await client.indices.create({ index: indexName });
} just at the beginning of the async function in the case the index is not created in Elastic. On the other hand, if you check the code of the index function at https://github.com/langchain-ai/langchainjs/blob/c9d445b3cf29c3253699fa4a375db9c560b4c2d3/langchain/src/indexes/indexing.ts, especifically at here deletions for increamental cleanup are got, but at here for full cleanup the same deletions are got but then at line 366 there is a guard on deletions having 0 length (preventing of calling I think that guard makes the difference between crashing with |
Beta Was this translation helpful? Give feedback.
-
Hey @telekosmos, great to see you back with another insightful question! Hope you're doing well. Thank you for providing such a detailed issue report. Based on the code and error message you've provided, it seems like the In the This behavior is not explicitly defined as a bug or intended in the LangChain JS framework. However, it would be more logical and efficient for the framework to handle such scenarios by checking if the I would recommend opening an issue in the LangChain JS repository to bring this to the attention of the maintainers. They might consider adding a check for an empty In the meantime, as a workaround, you could add a check in your code before calling the // Get the list of documents to delete
const docsToDelete = await recordManager.listKeys({
before: indexStartDt,
groupIds: sourceIds,
});
// Only call the index function if there are documents to delete
if (docsToDelete.length > 0) {
await index({
docsSource: [doc1, doc2],
recordManager,
vectorStore,
options: {
cleanup: 'incremental',
sourceIdKey: 'source',
},
});
} I hope this helps! If you have any other questions or need further clarification, please let me know. Sources
|
Beta Was this translation helpful? Give feedback.
-
Checked other resources
Commit to Help
Example Code
Description
Hi there
Using the javascript API, trying to follow this article I've observed the
index(args: IndexArgs)
function crashes when there is no documents to delete in the vector store if cleanup mode isincremental
. Sample code is as posted above.The code above will crash as is with an error similar to:
I wonder if this is a bug or the intended behaviour and the vector store has to be initialized some way. I've read this indexing workflow won't work properly if the store is pre-populated, but nothing about a crash in some circumstances.
Appreciate any help
System Info
macOS 14.3
Elastic
Beta Was this translation helpful? Give feedback.
All reactions