You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I searched the LangChain documentation with the integrated search.
I used the GitHub search to find a similar question and didn't find it.
I am sure that this is a bug in LangChain rather than my code.
The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).
Example Code
# ----------- code to store data in vectordb ----------------ext_to_loader= {
'.csv': CSVLoader,
'.json': JSONLoader,
'.txt': TextLoader,
'.pdf': PDFPlumberLoader,
'.docx': Docx2txtLoader,
'.pptx': PPTXLoader,
'.xlsx': ExcelLoader,
'.xls': ExcelLoader,
'single_page_url':WebBaseLoader,
'all_urls_from_base_url': RecursiveUrlLoader,
'directory': DirectoryLoader
}
defget_loader_for_extension(file_path):
_, ext=os.path.splitext(file_path)
loader_class=ext_to_loader.get(ext.lower())
ifloader_class:
returnloader_class(file_path)
else:
print(f"Unsupported file extension: {ext}")
returnNonedefnormalize_documents(docs):
return [
doc.page_contentifisinstance(doc.page_content, str) else'\n'.join(doc.page_content) ifisinstance(doc.page_content, list) else''fordocindocs
]
defvectorestore_function(split_documents_with_metadata, user_vector_store_path):
try:
# Create vector store with metadataembeddings=OpenAIEmbeddings(
model="text-embedding-ada-002",
openai_api_key=OPENAI_API_KEY
)
vector_store=Chroma(
embedding_function=embeddings,
persist_directory=user_vector_store_path
)
vector_store.add_documents(documents=split_documents_with_metadata)
returnvector_storeexceptExceptionase:
print(f'Error in vectorestore_function {str(e)}')
loader=get_loader_for_extension(saved_file_path)
docs=loader.load()
normalized_docs=normalize_documents(docs)
text_splitter=RecursiveCharacterTextSplitter(chunk_size=chunk_size)
split_docs=text_splitter.create_documents(normalized_docs)
split_documents_with_metadata= [
Document(page_content=document.page_content, metadata={"user_id": user_id, "doc_id": document_id})
fordocumentinsplit_docs
]
vectorestore_function(
split_documents_with_metadata,
user_vector_store_path
)
#Note: I use above (same) code to add or update new data # ----------------------------------------------------------- code for interaction with AI -----------------------------------------------------------defget_vector_store(user_vector_store_path):
embeddings=OpenAIEmbeddings(
model="text-embedding-ada-002",
openai_api_key=OPENAI_API_KEY
)
vectorstore=Chroma(
embedding_function=embeddings,
persist_directory=user_vector_store_path
)
returnvectorstoredocument_id_list= [str(document_id) ifisinstance(document_id, int) elsedocument_idfordocument_idindocument_id_list]
user_vector_store_path=os.path.join(VECTOR_STORE_PATH, user_id)
vectorstore=get_vector_store(user_vector_store_path)
retriever=vectorstore.as_retriever()
current_threshold=0.25try:
# Configure filteringretriever.search_type="similarity_score_threshold"retriever.search_kwargs= {
"filter": {
"$and": [
{"user_id": user_id},
{"doc_id": {"$in": document_id_list}}
]
},
"score_threshold": current_threshold,
"k": 3
}
retrieved_docs=retriever.invoke(question)
exceptExceptionase:
print(f'error: {str(e)}')
print(f"retrieved_docs : {retrieved_docs}")
ifnotretrieved_docs:
returnjsonify({'error': f'No relevant docs were retrieved.'}), 404
Error Message and Stack Trace (if applicable)
WARNING:langchain_core.vectorstores.base:No relevant docs were retrieved using the relevance score threshold 0.25
Description
I’m facing an issue with my live server. When a new user is created, a new vector database is generated, and everything works fine. If I add more data, it gets stored in the vector database, but I’m unable to retrieve the newly added data.
Interestingly, this issue does not occur in my local environment—it only happens on the live server. To make the new data retrievable, I have to execute pm2 reload "id", as my application is running with PM2. However, if another user is in the middle of a conversation when I reload PM2, the socket connection gets disconnected, disrupting their session.
Tech Stack:
Flutter – Used for the mobile application
Node.js – Used for the back office
Python – Handles data extraction, vector database creation, and conversations
The file download, embedding creation, and vector database updates are handled using Celery.
The server is set up with Apache, and PM2 is used to manage the application process.
Issue:
New data is added to the vector database but cannot be retrieved until pm2 reload "id" is executed.
Reloading PM2 disconnects active socket connections, affecting ongoing user conversations.
What I Want to Achieve:
I want to ensure that the system works seamlessly when a user adds or updates data in the vector database. The new data should be immediately accessible for conversations without requiring a PM2 reload.
In the back office, I am using Socket.IO to send status updates:
This message is successfully emitted, and users can start conversations after receiving it. However, I’m still facing the issue where newly added data is not retrievable until I reload PM2.
Question:
How can I ensure that the system updates the vector database dynamically without requiring a PM2 reload, while keeping active socket connections intact?
System Info
-------------------------------------------------- live server:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 48 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 2
On-line CPU(s) list: 0,1
Vendor ID: AuthenticAMD
Model name: AMD EPYC 7571
CPU family: 23
Model: 1
Thread(s) per core: 2
Core(s) per socket: 1
Socket(s): 1
Stepping: 2
BogoMIPS: 4399.99
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid tsc_known_freq pni pclmulqdq ssse3 fma
cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy cr8_legacy abm sse4a misalignsse 3dnowprefetch topoext vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt sha_ni xsaveopt xsavec xgetbv1 clze ro xsaveerptr arat npt nrip_save
Virtualization features:
Hypervisor vendor: KVM
Virtualization type: full
Caches (sum of all):
L1d: 32 KiB (1 instance)
L1i: 64 KiB (1 instance)
L2: 512 KiB (1 instance)
L3: 8 MiB (1 instance)
NUMA:
NUMA node(s): 1
NUMA node0 CPU(s): 0,1
Vulnerabilities:
Gather data sampling: Not affected
Itlb multihit: Not affected
L1tf: Not affected
Mds: Not affected
Meltdown: Not affected
Mmio stale data: Not affected
Reg file data sampling: Not affected
Retbleed: Mitigation; untrained return thunk; SMT vulnerable
Spec rstack overflow: Vulnerable: Safe RET, no microcode
Spec store bypass: Vulnerable
Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Spectre v2: Mitigation; Retpolines; STIBP disabled; RSB filling; PBRSB-eIBRS Not affected; BHI Not affected
Srbds: Not affected
Tsx async abort: Not affected
Checked other resources
Example Code
Error Message and Stack Trace (if applicable)
WARNING:langchain_core.vectorstores.base:No relevant docs were retrieved using the relevance score threshold 0.25
Description
I’m facing an issue with my live server. When a new user is created, a new vector database is generated, and everything works fine. If I add more data, it gets stored in the vector database, but I’m unable to retrieve the newly added data.
Interestingly, this issue does not occur in my local environment—it only happens on the live server. To make the new data retrievable, I have to execute pm2 reload "id", as my application is running with PM2. However, if another user is in the middle of a conversation when I reload PM2, the socket connection gets disconnected, disrupting their session.
Tech Stack:
Flutter – Used for the mobile application
Node.js – Used for the back office
Python – Handles data extraction, vector database creation, and conversations
The file download, embedding creation, and vector database updates are handled using Celery.
The server is set up with Apache, and PM2 is used to manage the application process.
Issue:
New data is added to the vector database but cannot be retrieved until pm2 reload "id" is executed.
Reloading PM2 disconnects active socket connections, affecting ongoing user conversations.
What I Want to Achieve:
I want to ensure that the system works seamlessly when a user adds or updates data in the vector database. The new data should be immediately accessible for conversations without requiring a PM2 reload.
In the back office, I am using Socket.IO to send status updates:
This message is successfully emitted, and users can start conversations after receiving it. However, I’m still facing the issue where newly added data is not retrievable until I reload PM2.
Question:
How can I ensure that the system updates the vector database dynamically without requiring a PM2 reload, while keeping active socket connections intact?
System Info
-------------------------------------------------- live server:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 48 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 2
On-line CPU(s) list: 0,1
Vendor ID: AuthenticAMD
Model name: AMD EPYC 7571
CPU family: 23
Model: 1
Thread(s) per core: 2
Core(s) per socket: 1
Socket(s): 1
Stepping: 2
BogoMIPS: 4399.99
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid tsc_known_freq pni pclmulqdq ssse3 fma
cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy cr8_legacy abm sse4a misalignsse 3dnowprefetch topoext vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt sha_ni xsaveopt xsavec xgetbv1 clze ro xsaveerptr arat npt nrip_save
Virtualization features:
Hypervisor vendor: KVM
Virtualization type: full
Caches (sum of all):
L1d: 32 KiB (1 instance)
L1i: 64 KiB (1 instance)
L2: 512 KiB (1 instance)
L3: 8 MiB (1 instance)
NUMA:
NUMA node(s): 1
NUMA node0 CPU(s): 0,1
Vulnerabilities:
Gather data sampling: Not affected
Itlb multihit: Not affected
L1tf: Not affected
Mds: Not affected
Meltdown: Not affected
Mmio stale data: Not affected
Reg file data sampling: Not affected
Retbleed: Mitigation; untrained return thunk; SMT vulnerable
Spec rstack overflow: Vulnerable: Safe RET, no microcode
Spec store bypass: Vulnerable
Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Spectre v2: Mitigation; Retpolines; STIBP disabled; RSB filling; PBRSB-eIBRS Not affected; BHI Not affected
Srbds: Not affected
Tsx async abort: Not affected
-------------------------------------------------- pip list:
Package Version
aiohappyeyeballs 2.4.4
aiohttp 3.11.11
aiosignal 1.3.2
amqp 5.3.1
annotated-types 0.7.0
anyio 4.8.0
asgiref 3.8.1
async-timeout 4.0.3
attrs 25.1.0
backoff 2.2.1
bcrypt 4.2.1
beautifulsoup4 4.12.3
bidict 0.23.1
billiard 4.2.1
blinker 1.9.0
build 1.2.2.post1
cachetools 5.5.1
celery 5.4.0
certifi 2024.12.14
cffi 1.17.1
charset-normalizer 3.4.1
chroma-hnswlib 0.7.6
chromadb 0.5.23
click 8.1.8
click-didyoumean 0.3.1
click-plugins 1.1.1
click-repl 0.3.0
colorama 0.4.6
coloredlogs 15.0.1
cryptography 44.0.0
dataclasses-json 0.6.7
Deprecated 1.2.17
distro 1.9.0
dnspython 2.7.0
docx2txt 0.8
durationpy 0.9
et_xmlfile 2.0.0
eventlet 0.39.0
exceptiongroup 1.2.2
fastapi 0.115.7
filelock 3.17.0
Flask 3.1.0
Flask-Cors 5.0.0
Flask-SocketIO 5.5.1
flatbuffers 25.1.24
frozenlist 1.5.0
fsspec 2024.12.0
google-auth 2.38.0
googleapis-common-protos 1.66.0
greenlet 3.1.1
grpcio 1.70.0
h11 0.14.0
httpcore 1.0.7
httptools 0.6.4
httpx 0.28.1
httpx-sse 0.4.0
huggingface-hub 0.27.1
humanfriendly 10.0
idna 3.10
importlib_metadata 8.5.0
importlib_resources 6.5.2
itsdangerous 2.2.0
Jinja2 3.1.5
jiter 0.8.2
jsonpatch 1.33
jsonpointer 3.0.0
kombu 5.4.2
kubernetes 32.0.0
langchain 0.3.15
langchain-chroma 0.2.0
langchain-community 0.3.15
langchain-core 0.3.31
langchain-openai 0.3.2
langchain-text-splitters 0.3.5
langsmith 0.3.1
lxml 5.3.0
markdown-it-py 3.0.0
MarkupSafe 3.0.2
marshmallow 3.26.0
mdurl 0.1.2
mmh3 5.1.0
monotonic 1.6
mpmath 1.3.0
multidict 6.1.0
mypy-extensions 1.0.0
numpy 1.26.4
oauthlib 3.2.2
onnxruntime 1.20.1
openai 1.60.1
openpyxl 3.1.5
opentelemetry-api 1.29.0
opentelemetry-exporter-otlp-proto-common 1.29.0
opentelemetry-exporter-otlp-proto-grpc 1.29.0
opentelemetry-instrumentation 0.50b0
opentelemetry-instrumentation-asgi 0.50b0
opentelemetry-instrumentation-fastapi 0.50b0
opentelemetry-proto 1.29.0
opentelemetry-sdk 1.29.0
opentelemetry-semantic-conventions 0.50b0
opentelemetry-util-http 0.50b0
orjson 3.10.15
overrides 7.7.0
packaging 24.2
pandas 2.2.3
pdf2image 1.17.0
pdfminer.six 20231228
pdfplumber 0.11.5
pillow 11.1.0
pip 22.0.2
posthog 3.10.0
prompt_toolkit 3.0.50
propcache 0.2.1
protobuf 5.29.3
pyasn1 0.6.1
pyasn1_modules 0.4.1
pycparser 2.22
pydantic 2.10.6
pydantic_core 2.27.2
pydantic-settings 2.7.1
Pygments 2.19.1
PyMySQL 1.1.1
pyOpenSSL 25.0.0
pypdfium2 4.30.1
PyPika 0.48.9
pyproject_hooks 1.2.0
pyreadline3 3.5.4
pytesseract 0.3.13
python-dateutil 2.9.0.post0
python-dotenv 1.0.1
python-engineio 4.11.2
python-pptx 1.0.2
python-socketio 5.12.1
pytz 2024.2
PyYAML 6.0.2
redis 5.2.1
regex 2024.11.6
requests 2.32.3
requests-oauthlib 2.0.0
requests-toolbelt 1.0.0
rich 13.9.4
rsa 4.9
setuptools 59.6.0
shellingham 1.5.4
simple-websocket 1.1.0
six 1.17.0
sniffio 1.3.1
soupsieve 2.6
SQLAlchemy 2.0.37
starlette 0.45.3
sympy 1.13.3
tenacity 9.0.0
tiktoken 0.8.0
tokenizers 0.20.3
tomli 2.2.1
tqdm 4.67.1
typer 0.15.1
typing_extensions 4.12.2
typing-inspect 0.9.0
tzdata 2025.1
urllib3 2.3.0
uvicorn 0.34.0
uvloop 0.21.0
vine 5.1.0
watchfiles 1.0.4
wcwidth 0.2.13
websocket-client 1.8.0
websockets 14.2
Werkzeug 3.1.3
wrapt 1.17.2
wsproto 1.2.0
xlrd 2.0.1
XlsxWriter 3.2.1
yarl 1.18.3
zipp 3.21.0
zstandard 0.23.0
The text was updated successfully, but these errors were encountered: