Improved handling of multidoc insertion errors for vector store #110
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR alleviates a problem with the kind of error that is raised under different failure scenarios during AstraDBVectorStore insertions.
(Do not mind the CI failing. There is some bug with the CI actions related to secrets not trickling down to the test workflows. I ran the tests locally and everything related went smooth.)<== Nevermind, those CI problems somehow have vanished (by themselves?).Current problems:
the insertion is attempted assuming new document _ids. If it fails, the (detailed) errors from astrapy are inspected.
(note: "errors", plural, since a single insertion can result in a number of errors from several documents).
Now, this exception right now is the one received from Astrapy, a fact which has two flaws:
These 2 issues can combine leading to a scenario where all the user sees is a "doc already exists", adding to the confusion.
Also it can be argued that it is leaky, in this case, to expose an astrapy exception as is to the user.
What this PR does
The change is for case 2 above.
ValueError
is raised (and not aastrapy.exception.InsertManyException
anymore - slightly breaking for try-catch code)Below is a Python script exemplifying various insertion scenarios and the before- and after- kind of exception a user would get.
Note:
Right now, all of the errors end up in the error message, even if they are very many. I wonder if it's better to put a upper limit to this and avoid extremely-long strings ending in logs etc, with little added value over a cap of, say, 20 errors.
Demo script (before-and-after)