Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Previously working RAG pipeline stopped working - cannot import name 'can_be_positional' from 'pydantic._internal._utils' #370

Open
oboudry-mvp opened this issue Dec 13, 2024 · 10 comments

Comments

@oboudry-mvp
Copy link

I've been successfully running a lanchain RAG pipeline since August 2024, but it stopped working a couple weeks ago, also there was no code changes. I did not notice immediately as it's not used a lot, and therefore cannot relate it to a specific open-webui, langchain, or pydantic update.

What is weird is that it runs from an interactive python shell in the same environment, but does not work when loaded from open-webui.

My repo is public and the full code for the pipeline is available here: https://github.com/marvinpac-it/ask-hr-policies-open-webui/blob/master/ask_hr_policies_pipeline.py

When I check the logs of the pipeline pod, I'm getting the following:

olivier@SINGLE-HOST-DOCKER:~/k8s/open-webui$ k logs open-webui-pipelines-769d89bc96-crwrt | tail
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv

[notice] A new release of pip is available: 24.0 -> 24.3.1
[notice] To update, run: pip install --upgrade pip
Error loading module: ask_hr_policies_pipeline
cannot import name 'can_be_positional' from 'pydantic._internal._utils' (/usr/local/lib/python3.11/site-packages/pydantic/_internal/_utils.py)
WARNING:root:No Pipeline class found in ask_hr_policies_pipeline
INFO:     10.233.103.164:37564 - "POST /pipelines/upload HTTP/1.1" 200 OK
INFO:     10.233.103.164:46070 - "GET /pipelines HTTP/1.1" 200 OK
INFO:     10.233.103.164:46074 - "GET /models HTTP/1.1" 200 OK

I tried to exec the code directly inside the pipelines pod, hoping it would help me get better debug messages, but when I do that it runs perfectly well, as can be seen in the following session:

olivier@SINGLE-HOST-DOCKER:~/k8s/open-webui$ k exec -it open-webui-pipelines-769d89bc96-crwrt -- bash
root@open-webui-pipelines-769d89bc96-crwrt:/app# cd pipelines/
root@open-webui-pipelines-769d89bc96-crwrt:/app/pipelines# python
Python 3.11.10 (main, Nov 12 2024, 02:25:24) [GCC 12.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from failed.ask_hr_policies_pipeline import Pipeline
>>> p = Pipeline()
>>> import asyncio
>>> asyncio.run(p.on_startup())
on_startup:failed.ask_hr_policies_pipeline
⚠️ It looks like you upgraded from a version below 0.6 and could benefit from vacuuming your database. Run chromadb utils vacuum --help for more information.
>>> result = p.pipe("What is the vacation policy?", "", [], {})
>>> for r in result:
...   r
...
pipe:failed.ask_hr_policies_pipeline
''
'The'
' vacation'
' policy'
' states'
' that'
' employees'
' are'
' entitled'
' to'
' '
'20'
' working'
' days'
' of'
' vacation'
' per'
' calendar'
' year'
--- stripped for sake of space ---

I deployed open-webui and pipelines using the open-webui helm chart from the helm.openwebui.com repo. I'm currently running the latest version.

apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
namespace: open-webui

helmCharts:
- name: open-webui
  repo: https://helm.openwebui.com/
  version: "4.1.0"
  releaseName: open-webui
  namespace: open-webui
  valuesFile: open-webui-values.yaml

resources:
- open-webui-namespace.yaml
- apis-sealed-secret.yaml
- chromadb-pvc.yaml
- langfuse-sealed-secret.yaml

The pipeline is defined in the values file of the helm chart:

pipelines:
  extraEnvVars:
    - name: OPENAI_API_KEY
      valueFrom:
        secretKeyRef:
          name: openai-config
          key: openai-key
    - name: PIPELINES_URLS
      value: "https://github.com/marvinpac-it/ask-hr-policies-open-webui/blob/master/ask_hr_policies_pipeline.py"
    - name: LANGFUSE_URL
      valueFrom:
        secretKeyRef:

Any help troubleshooting this further would be really appreciated.

@Cody-W-Tucker
Copy link

Cody-W-Tucker commented Dec 26, 2024

What is your Pydantic version? I set mine to 2.8.2 and the error went away.

# pip list | grep pydantic
pydantic                                 2.8.2
pydantic_core                            2.20.1

Edit: If we pin the requirements, this won't happen again. These work together, didn't check chroma though.

requirements: langchain==0.3.1, langchain_core==0.3.7, langchain_openai==0.2.1, langchain_qdrant==0.2.0, langchain_text_splitters==0.3.0

@oboudry-mvp
Copy link
Author

I was running version 2.10.4

# pip list | grep pydantic
pydantic                                 2.10.4
pydantic_core                            2.27.2

Downgrading to 2.8.2 did not fix the issue for me, but the error message changed. That may give me some hints for troubleshooting.

My new requirements line is:

requirements: langchain==0.3.1, langchain_openai==0.2.1, langchain_chroma, langchain_core==0.3.7, langfuse, pydantic==2.8.2

Below is the tail of the log after forcing 2.8.2.

Installing requirement: pydantic==2.8.2
Collecting pydantic==2.8.2
  Downloading pydantic-2.8.2-py3-none-any.whl.metadata (125 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 125.2/125.2 kB 4.7 MB/s eta 0:00:00
Requirement already satisfied: annotated-types>=0.4.0 in /usr/local/lib/python3.11/site-packages (from pydantic==2.8.2) (0.7.0)
Collecting pydantic-core==2.20.1 (from pydantic==2.8.2)
  Downloading pydantic_core-2.20.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.6 kB)
Requirement already satisfied: typing-extensions>=4.6.1 in /usr/local/lib/python3.11/site-packages (from pydantic==2.8.2) (4.12.2)
Downloading pydantic-2.8.2-py3-none-any.whl (423 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 423.9/423.9 kB 11.7 MB/s eta 0:00:00
Downloading pydantic_core-2.20.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.1 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.1/2.1 MB 16.8 MB/s eta 0:00:00
Installing collected packages: pydantic-core, pydantic
  Attempting uninstall: pydantic-core
    Found existing installation: pydantic_core 2.27.2
    Uninstalling pydantic_core-2.27.2:
      Successfully uninstalled pydantic_core-2.27.2
  Attempting uninstall: pydantic
    Found existing installation: pydantic 2.10.4
    Uninstalling pydantic-2.10.4:
      Successfully uninstalled pydantic-2.10.4
Successfully installed pydantic-2.8.2 pydantic-core-2.20.1
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv

[notice] A new release of pip is available: 24.0 -> 24.3.1
[notice] To update, run: pip install --upgrade pip
Error loading module: ask_hr_policies_pipeline
FieldValidatorDecoratorInfo.__init__() got an unexpected keyword argument 'json_schema_input_type'
WARNING:root:No Pipeline class found in ask_hr_policies_pipeline
INFO:     10.233.79.197:59422 - "POST /pipelines/upload HTTP/1.1" 200 OK
INFO:     10.233.79.197:48756 - "GET /pipelines HTTP/1.1" 200 OK
INFO:     10.233.79.197:48762 - "GET /models HTTP/1.1" 200 OK

I'll look into that new error. Thanks for the help.

@Cody-W-Tucker
Copy link

I had that error too, it was from a mismatch in langchain module and the older pydantic version. LangChain sends a different structure than Pydantic expected.

Until I find a better way to manage dependencies, these are the working versions. You'll just have to try chromadb at different versions.

requirements: langchain==0.3.1, langchain_core==0.3.7, langchain_openai==0.2.1, langchain_qdrant==0.2.0, langchain_text_splitters==0.3.0, pydantic==2.8.2

@zenarcher007
Copy link

I had this problem as well, and these suggestions did not work for me, so I investigated the code some. Since the main.py script imports and calls methods from your pipeline script, your pipeline runs in the same Python process as the main script.

Main.py imports pydantic (2.7.1). Python caches imported modules so they are only processed once, and was not designed to have a module updated in the "middle of running". Therefore, after updating pydantic, importing it in the pipeline has no effect, and it is still remains on 2.7.1.

You can verify this by printing the version of the module in your pipeline:

"""
 ...
requirements: pydantic==2.10.5    # Installs upon uploading to webui
"""
import pydantic
print(pydantic.__version___)

2.7.1

Essentially, this means that you cannot use "requirements:" from the web UI to change the version of any package that is also imported in main.py. I would recommend either setting PIPELINES_REQUIREMENTS_PATH to an existing requirements.txt or using PIPELINES_URLS, as these both make it install packages before the main script is started.

Possible workarounds:

  • "Re-import the module" - "Hot loading" (using importlib.reload and sys.modules.pop) did not work for me

Possible solutions:

  • I am considering creating a PR for automatically exiting and restarting the main.py script after updating requirements. One drawback I see is that it could make main.py run using different package versions upon its first restart, which might make it less consistent.

@oboudry-mvp
Copy link
Author

Effectively, I'm getting the same pydantic version independantly of the requirements section:

k logs open-webui-pipelines-bf6db98c9-9nkk7 | grep PYDANTIC
PYDANTIC: 2.7.1

I'm already using PIPELINES_URLS for deploying the pipeline but it still fails.

In the end this line of requirements fixed the issue for me:

requirements: langchain==0.3.3, langchain_openai==0.2.2, langchain_chroma==0.1.4, langchain_core==0.3.10, openai==1.51.2, langfuse==2.52.0

Thanks for the guidance.

@Cody-W-Tucker
Copy link

These requirements are a real pain.

If the pipelines container restarts, the start script will pull in the default version of pydantic and since there's a conflict between (langchain and pydantic), will cause an infinite loop of trying to load the dependencies over and over.

It should be noted, LangChain suggests using LangGraph for these history aware retrivers now. So maybe a rewrite would help.

@oboudry-mvp
Copy link
Author

What was also a pain for debugging was the lack of a proper stack trace showing where the code breaks. I was only getting a single line of error message. Same for initial setup trying to find right requirements line. Not sure if there is a better way to debug.

@Subhankar-Adak
Copy link

Following steps worked for me:

Restart the pod, this will clean up any dependency mismatch due to multiple trials.

install the following versions:

pip install pydantic==2.7.4 langchain==0.3.3 langchain-community==0.3.2 langchain-openai==0.2.2 langchain-core==0.3.10 langchain-text-splitters==0.3.0
pip install langchain-nvidia-ai-endpoints

@oboudry-mvp
Copy link
Author

Agreed, I also had to delete the pod (I'm hosting this on K8S) to make it work. Just changing the requirements version is not enough.

@argafurov
Copy link

I'm running in a docker compose. I do docker compose up -d to start pipelines and openwebui services and then docker compose restart pipelines. After that my pipeline operates as expected. Version: v.0.4.8

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants