You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Looking at the error logs and your Node.js implementation, I notice a key difference in how documents are stored and retrieved. The Python implementation assumes documents are stored with a "documents/" prefix, while the Node.js implementation appears to store them directly in the root of the bucket.
This mismatch causes issues when the Python implementation attempts to retrieve documents. Specifically, the Python implementation is searching for documents at paths like:
documents/49689d38e7eb
While the Node.js implementation is storing them as:
49689d38e7eb
Suggested Fix
Update the Python implementation to match the Node.js storage pattern by removing the "documents/" prefix:
classGCSDocumentStorage(DocumentStorage):
"""Stores documents in Google Cloud Storage. For each pair id, document_text the name of the blob will be {prefix}/{id} stored in plain text format. """def__init__(
self,
bucket: storage.Bucket,
prefix: Optional[str] ="documents", # Remove "documents" herethreaded=True,
n_threads=8,
) ->None:
Or being able to pass GCSDocumentStorage instead of the bucket name.
Let me know if you need additional details or logs to debug this further.
Thank you !
The text was updated successfully, but these errors were encountered:
Could you add a link to nodejs implementation you mentioned, please?
You still can directly initiate VectorSearchVectorStore and change the prefix since GCSDocumentStorage has the prefix arg. We could add prefix arg also to from_components method and pass it to GCSDocumentStorage.
So that's the second option you suggested. Please, feel free to send a PR.
Description
Looking at the error logs and your Node.js implementation, I notice a key difference in how documents are stored and retrieved. The Python implementation assumes documents are stored with a "documents/" prefix, while the Node.js implementation appears to store them directly in the root of the bucket.
This mismatch causes issues when the Python implementation attempts to retrieve documents. Specifically, the Python implementation is searching for documents at paths like:
documents/49689d38e7eb
While the Node.js implementation is storing them as:
49689d38e7eb
Suggested Fix
Update the Python implementation to match the Node.js storage pattern by removing the "documents/" prefix:
Or being able to pass GCSDocumentStorage instead of the bucket name.
Let me know if you need additional details or logs to debug this further.
Thank you !
The text was updated successfully, but these errors were encountered: