Not consistent results for same input with ChatWatsonx, meta-llama/llama-3-1-70b-instruct, DECODING_METHOD='greedy' #43

polubarev · 2024-11-19T16:14:44Z

Found out that IBM ChatWatsonx integration with Langchain and IBM watsonx.ai Text Chat API provide not consistent outputs for the same input with decoding_method=greedy or temperature=0.

langchain-ibm == 0.3.3

langchain-core == 0.3.19

ibm_watsonx_ai == 1.1.23

IBM ChatWatsonx integration with Langchain

from langchain_core.messages import (
    HumanMessage,
    SystemMessage,
)
from langchain_ibm import ChatWatsonx
import pandas as pd
from ibm_watsonx_ai.metanames import GenTextParamsMetaNames as GenTextParamsMetaNames
from tqdm import tqdm

api_key='api_key'
project_id = 'project_id'

credentials = {
    "url": "https://us-south.ml.cloud.ibm.com",
    "apikey": api_key}

ibm_llm = ChatWatsonx(
    model_id = "meta-llama/llama-3-1-70b-instruct",
    url=credentials.get("url"),
    apikey=credentials.get("apikey"),
    project_id=project_id,
    # params = parameters,
    params = {  GenTextParamsMetaNames.DECODING_METHOD: "greedy",
                GenTextParamsMetaNames.MAX_NEW_TOKENS: 5000,
                GenTextParamsMetaNames.MIN_NEW_TOKENS: 1,
             }
)

system_message = SystemMessage(
    content="You are a helpful assistant which telling short-info about provided topic."
)
human_message = HumanMessage(content="horse")

test_res = [ibm_llm.invoke([system_message, human_message]) for _ in tqdm(range(10))]

print(pd.Series([message.content for message in test_res]).nunique())
# 10 - 10 unique outputs from same input

IBM watsonx.ai Text Chat API

import requests
from tqdm import tqdm
import pandas as pd

access_token = 'access_token'
project_id = 'project_id'

url = 'https://us-south.ml.cloud.ibm.com/ml/v1/text/chat?version=2023-10-25'
headers = {
    'Authorization': f'Bearer {access_token}', 
    'Content-Type': 'application/json',
    'Accept': 'application/json'
}
data = {
    "model_id": "meta-llama/llama-3-1-70b-instruct",
    "project_id": project_id,
    "messages": [
        {
            "role": "system",
            "content": "You are a helpful assistant which telling short-info about provided topic."
        },
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "horse"
                }
            ]
        },
    ],
    "max_tokens": 1000,
    "temperature": 0,
    "time_limit": 10000
}

response = requests.post(url, headers=headers, json=data)

test_res = [requests.post(url, headers=headers, json=data) for _ in tqdm(range(10))]
print(pd.Series([message.json()['choices'][0]['message']['content'] for message in test_res if message.status_code == 200]).nunique())

This issue affects projects including agentic behavior in solutions.

The text was updated successfully, but these errors were encountered:

MateuszOssGit · 2024-11-20T09:21:37Z

Hi @polubarev
It is looks that you are providing uncorrect parameters for chat component.
Corrected params has been described in ChatWatsonx LangChain's documentation https://python.langchain.com/docs/integrations/chat/ibm_watsonx/#instantiation

I retested you code using ChatWatsonx with params as below and i got 1 unique response from 10 invokes.

params = {
   "temperature": 0
}

Could you please retest your issue with set params as above ?

polubarev · 2024-11-20T11:03:17Z

Hi @MateuszOssGit
Thanks for recommendation, tested it with new parameters, still the same result.

from langchain_core.messages import HumanMessage, SystemMessage
from langchain_ibm import ChatWatsonx
import pandas as pd
from tqdm import tqdm

api_key = 'api_key'
project_id = 'project_id'

credentials = {
    "url": "https://us-south.ml.cloud.ibm.com",
    "apikey": api_key}

parameters = {
    'temperature': 0,
    'max_tokens': 200,
}

ibm_llm = ChatWatsonx(
    model_id="meta-llama/llama-3-1-70b-instruct",
    url=credentials.get("url"),
    apikey=credentials.get("apikey"),
    project_id=project_id,
    params=parameters,
)

system_message = SystemMessage(
    content="You are a helpful assistant which telling short-info about provided topic."
)
human_message = HumanMessage(content="horse")

test_res = [ibm_llm.invoke([system_message, human_message]) for _ in tqdm(range(20))]
n_unique_outputs = pd.Series([message.content for message in test_res]).nunique()
print(f"Number of unique outputs: {n_unique_outputs}")

Result is: Number of unique outputs: 2

Sometimes to get this results you need to run more iterations.
I tried with 50 iterations and I have got next distribution of unique outputs: [36, 6, 6, 1, 1].

MateuszOssGit · 2024-11-20T12:27:30Z

I conducted a test with 50 iterations and observed six unique outputs. I plan to raise an issue in the ibm_watsonx_ai dependency package repository, as the problem appears to be specifically related to the ibm_watsonx_ai package rather than the langchain_ibm package.

Did you observed it only with meta-llama/llama-3-1-70b-instruct model?

polubarev · 2024-11-20T12:36:03Z

I have not tested other models. I think they would have the same issues.
Also as I mentioned in the first comment the same problem is for watsonx.ai Text Chat API. So I think that this problem is coming from there. https://cloud.ibm.com/apidocs/watsonx-ai#text-chat

MateuszOssGit · 2024-11-20T14:36:44Z

Yes, you are right. I created issue and i will inform you about updates.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Not consistent results for same input with ChatWatsonx, meta-llama/llama-3-1-70b-instruct, DECODING_METHOD='greedy' #43

Not consistent results for same input with ChatWatsonx, meta-llama/llama-3-1-70b-instruct, DECODING_METHOD='greedy' #43

polubarev commented Nov 19, 2024

MateuszOssGit commented Nov 20, 2024

polubarev commented Nov 20, 2024 •

edited

Loading

MateuszOssGit commented Nov 20, 2024

polubarev commented Nov 20, 2024

MateuszOssGit commented Nov 20, 2024

Not consistent results for same input with ChatWatsonx, meta-llama/llama-3-1-70b-instruct, DECODING_METHOD='greedy' #43

Not consistent results for same input with ChatWatsonx, meta-llama/llama-3-1-70b-instruct, DECODING_METHOD='greedy' #43

Comments

polubarev commented Nov 19, 2024

IBM ChatWatsonx integration with Langchain

IBM watsonx.ai Text Chat API

MateuszOssGit commented Nov 20, 2024

polubarev commented Nov 20, 2024 • edited Loading

MateuszOssGit commented Nov 20, 2024

polubarev commented Nov 20, 2024

MateuszOssGit commented Nov 20, 2024

polubarev commented Nov 20, 2024 •

edited

Loading