Llama-3.2-3B-Instruct failed to use with HuggingfacePipeline because of setting a non-string value as the pad_token #29431

tishizaki · 2025-01-27T04:52:36Z

Checked other resources

I added a very descriptive title to this issue.
I searched the LangChain documentation with the integrated search.
I used the GitHub search to find a similar question and didn't find it.
I am sure that this is a bug in LangChain rather than my code.
The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).

Example Code

The following code:

from langchain_community.llms import HuggingFacePipeline

hf = HuggingFacePipeline.from_model_id(
    model_id="meta-llama/Llama-3.2-3B-Instruct",
    task="text-generation",
    pipeline_kwargs={"max_new_tokens": 10},
)

from langchain_core.prompts import PromptTemplate

template = """Question: {question}

Answer: Let's think step by step."""
prompt = PromptTemplate.from_template(template)

chain = prompt | hf

question = "What is electroencephalography?"

print(chain.invoke({"question": question}))

Error Message and Stack Trace (if applicable)

Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00,  2.69it/s]
Traceback (most recent call last):
  File "/home/ishi/work/hf_pipeline_sample.py", line 4, in <module>
    hf = HuggingFacePipeline.from_model_id(
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ishi/work/langchain/libs/community/langchain_community/llms/huggingface_pipeline.py", line 172, in from_model_id
    tokenizer.pad_token_id = model.config.eos_token_id
    ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ishi/work/.venv/lib/python3.12/site-packages/transformers/tokenization_utils_base.py", line 1077, in __setattr__
    raise ValueError(f"Cannot set a non-string value as the {key}")
ValueError: Cannot set a non-string value as the pad_token

Description

I tried to do HuggingfacePipeline of langchain_community with Llama-3.2-3B-Instruct, but the error occured.

I think that it's the same bug as transformers issue 34869.

When I changed /libs/community/langchain_community/llms/huggingface_pipeline.py L172 as followings, the error didn't occur.

        if tokenizer.pad_token_id is None:
            if model.config.pad_token_id is not None:
                tokenizer.pad_token_id = model.config.pad_token_id
            elif model.config.eos_token_id is not None and isinstance(model.config.eos_token_id, int):
                tokenizer.pad_token_id = model.config.eos_token_id
            elif tokenizer.eos_token_id is not None:
                tokenizer.pad_token_id = tokenizer.eos_token_id
            else:
                tokenizer.add_special_tokens({"pad_token": "[PAD]"})

It's the same procedure as this pull request.

System Info

System Information

OS: Ubuntu 24.04
Kernel Version: 6.8.0-51-generic
Python Version: 3.12.7
Model: Llama-3.2-3B-Instruct

langchain 0.3.15
langchain-community 0.3.15
langchain-core 0.3.31
transformers 4.47.1

The text was updated successfully, but these errors were encountered:

Description: Add to check pad_token_id and eos_token_id of model config. It seems that this is the same bug as the HuggingFace TGI bug. In addition, the source code of libs/partners/huggingface/langchain_huggingface/llms/huggingface_pipeline.py also requires similar changes. Issue: langchain-ai#29431 Dependencies: none Twitter handle: tell14

- **Description:** Add to check pad_token_id and eos_token_id of model config. It seems that this is the same bug as the HuggingFace TGI bug. In addition, the source code of libs/partners/huggingface/langchain_huggingface/llms/huggingface_pipeline.py also requires similar changes. - **Issue:** #29431 - **Dependencies:** none - **Twitter handle:** tell14

S0PEX · 2025-01-28T19:48:01Z

I've updated to langchain-community==0.3.16, but the issue still persists. Although this version already contains fix #29434.

Have you had any luck getting this model working?

Edit: Alright, never mind. Make sure to import from langchain_community.llms import HuggingFacePipeline as the same class also exists in langchain.llms!

tishizaki · 2025-01-29T00:53:17Z

@S0PEX
The same bug in libs/partners/huggingface/langchain_huggingface/llms/huggingface_pipeline.py still persists.

I haven't been able to post a patch because I haven't confirmed how to change the test program yet.

dosubot bot added the 🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature label Jan 27, 2025

tishizaki mentioned this issue Jan 27, 2025

community: Fixed the procedure of initializing pad_token_id #29434

Merged

tishizaki mentioned this issue Jan 30, 2025

partners: Fixed the procedure of initializing pad_token_id #29500

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Llama-3.2-3B-Instruct failed to use with HuggingfacePipeline because of setting a non-string value as the pad_token #29431

Llama-3.2-3B-Instruct failed to use with HuggingfacePipeline because of setting a non-string value as the pad_token #29431

tishizaki commented Jan 27, 2025

S0PEX commented Jan 28, 2025 •

edited

Loading

tishizaki commented Jan 29, 2025

Llama-3.2-3B-Instruct failed to use with HuggingfacePipeline because of setting a non-string value as the pad_token #29431

Llama-3.2-3B-Instruct failed to use with HuggingfacePipeline because of setting a non-string value as the pad_token #29431

Comments

tishizaki commented Jan 27, 2025

Checked other resources

Example Code

Error Message and Stack Trace (if applicable)

Description

System Info

System Information

S0PEX commented Jan 28, 2025 • edited Loading

tishizaki commented Jan 29, 2025

S0PEX commented Jan 28, 2025 •

edited

Loading