Python Implementation Incompatible with Node.js Document Storage prefix path #699

Seigneurhol · 2025-01-17T16:49:44Z

Description

Looking at the error logs and your Node.js implementation, I notice a key difference in how documents are stored and retrieved. The Python implementation assumes documents are stored with a "documents/" prefix, while the Node.js implementation appears to store them directly in the root of the bucket.
This mismatch causes issues when the Python implementation attempts to retrieve documents. Specifically, the Python implementation is searching for documents at paths like:

documents/49689d38e7eb
While the Node.js implementation is storing them as:

49689d38e7eb

Suggested Fix

Update the Python implementation to match the Node.js storage pattern by removing the "documents/" prefix:

class GCSDocumentStorage(DocumentStorage):
    """Stores documents in Google Cloud Storage.
    For each pair id, document_text the name of the blob will be {prefix}/{id} stored
    in plain text format.
    """

    def __init__(
        self,
        bucket: storage.Bucket,
        prefix: Optional[str] = "documents", # Remove "documents" here
        threaded=True,
        n_threads=8,
    ) -> None:

Or being able to pass GCSDocumentStorage instead of the bucket name.

Let me know if you need additional details or logs to debug this further.
Thank you !

The text was updated successfully, but these errors were encountered:

lkuligin · 2025-01-18T06:20:48Z

Could you add a link to nodejs implementation you mentioned, please?

You still can directly initiate VectorSearchVectorStore and change the prefix since GCSDocumentStorage has the prefix arg. We could add prefix arg also to from_components method and pass it to GCSDocumentStorage.
So that's the second option you suggested. Please, feel free to send a PR.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Python Implementation Incompatible with Node.js Document Storage prefix path #699

Python Implementation Incompatible with Node.js Document Storage prefix path #699

Seigneurhol commented Jan 17, 2025

lkuligin commented Jan 18, 2025

Python Implementation Incompatible with Node.js Document Storage prefix path #699

Python Implementation Incompatible with Node.js Document Storage prefix path #699

Comments

Seigneurhol commented Jan 17, 2025

Description

Suggested Fix

lkuligin commented Jan 18, 2025