Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not consistent results for same input with ChatWatsonx, meta-llama/llama-3-1-70b-instruct, DECODING_METHOD='greedy' #43

Open
polubarev opened this issue Nov 19, 2024 · 5 comments

Comments

@polubarev
Copy link

Found out that IBM ChatWatsonx integration with Langchain and IBM watsonx.ai Text Chat API provide not consistent outputs for the same input with decoding_method=greedy or temperature=0.

langchain-ibm == 0.3.3

langchain-core == 0.3.19

ibm_watsonx_ai == 1.1.23

IBM ChatWatsonx integration with Langchain

from langchain_core.messages import (
    HumanMessage,
    SystemMessage,
)
from langchain_ibm import ChatWatsonx
import pandas as pd
from ibm_watsonx_ai.metanames import GenTextParamsMetaNames as GenTextParamsMetaNames
from tqdm import tqdm

api_key='api_key'
project_id = 'project_id'

credentials = {
    "url": "https://us-south.ml.cloud.ibm.com",
    "apikey": api_key}

ibm_llm = ChatWatsonx(
    model_id = "meta-llama/llama-3-1-70b-instruct",
    url=credentials.get("url"),
    apikey=credentials.get("apikey"),
    project_id=project_id,
    # params = parameters,
    params = {  GenTextParamsMetaNames.DECODING_METHOD: "greedy",
                GenTextParamsMetaNames.MAX_NEW_TOKENS: 5000,
                GenTextParamsMetaNames.MIN_NEW_TOKENS: 1,
             }
)

system_message = SystemMessage(
    content="You are a helpful assistant which telling short-info about provided topic."
)
human_message = HumanMessage(content="horse")

test_res = [ibm_llm.invoke([system_message, human_message]) for _ in tqdm(range(10))]

print(pd.Series([message.content for message in test_res]).nunique())
# 10 - 10 unique outputs from same input

IBM watsonx.ai Text Chat API

import requests
from tqdm import tqdm
import pandas as pd

access_token = 'access_token'
project_id = 'project_id'

url = 'https://us-south.ml.cloud.ibm.com/ml/v1/text/chat?version=2023-10-25'
headers = {
    'Authorization': f'Bearer {access_token}', 
    'Content-Type': 'application/json',
    'Accept': 'application/json'
}
data = {
    "model_id": "meta-llama/llama-3-1-70b-instruct",
    "project_id": project_id,
    "messages": [
        {
            "role": "system",
            "content": "You are a helpful assistant which telling short-info about provided topic."
        },
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "horse"
                }
            ]
        },
    ],
    "max_tokens": 1000,
    "temperature": 0,
    "time_limit": 10000
}

response = requests.post(url, headers=headers, json=data)

test_res = [requests.post(url, headers=headers, json=data) for _ in tqdm(range(10))]
print(pd.Series([message.json()['choices'][0]['message']['content'] for message in test_res if message.status_code == 200]).nunique())

This issue affects projects including agentic behavior in solutions.

@MateuszOssGit
Copy link
Collaborator

Hi @polubarev
It is looks that you are providing uncorrect parameters for chat component.
Corrected params has been described in ChatWatsonx LangChain's documentation https://python.langchain.com/docs/integrations/chat/ibm_watsonx/#instantiation

I retested you code using ChatWatsonx with params as below and i got 1 unique response from 10 invokes.

params = {
   "temperature": 0
}
Screenshot 2024-11-20 at 10 20 38

Could you please retest your issue with set params as above ?

@polubarev
Copy link
Author

polubarev commented Nov 20, 2024

Hi @MateuszOssGit
Thanks for recommendation, tested it with new parameters, still the same result.

from langchain_core.messages import HumanMessage, SystemMessage
from langchain_ibm import ChatWatsonx
import pandas as pd
from tqdm import tqdm

api_key = 'api_key'
project_id = 'project_id'

credentials = {
    "url": "https://us-south.ml.cloud.ibm.com",
    "apikey": api_key}

parameters = {
    'temperature': 0,
    'max_tokens': 200,
}

ibm_llm = ChatWatsonx(
    model_id="meta-llama/llama-3-1-70b-instruct",
    url=credentials.get("url"),
    apikey=credentials.get("apikey"),
    project_id=project_id,
    params=parameters,
)

system_message = SystemMessage(
    content="You are a helpful assistant which telling short-info about provided topic."
)
human_message = HumanMessage(content="horse")

test_res = [ibm_llm.invoke([system_message, human_message]) for _ in tqdm(range(20))]
n_unique_outputs = pd.Series([message.content for message in test_res]).nunique()
print(f"Number of unique outputs: {n_unique_outputs}")

Result is: Number of unique outputs: 2

Sometimes to get this results you need to run more iterations.
I tried with 50 iterations and I have got next distribution of unique outputs: [36, 6, 6, 1, 1].

@MateuszOssGit
Copy link
Collaborator

I conducted a test with 50 iterations and observed six unique outputs. I plan to raise an issue in the ibm_watsonx_ai dependency package repository, as the problem appears to be specifically related to the ibm_watsonx_ai package rather than the langchain_ibm package.

Did you observed it only with meta-llama/llama-3-1-70b-instruct model?

@polubarev
Copy link
Author

I have not tested other models. I think they would have the same issues.
Also as I mentioned in the first comment the same problem is for watsonx.ai Text Chat API. So I think that this problem is coming from there. https://cloud.ibm.com/apidocs/watsonx-ai#text-chat

@MateuszOssGit
Copy link
Collaborator

Yes, you are right. I created issue and i will inform you about updates.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants