[Bug] TypeError: expected str, bytes or os.PathLike object, not NoneType #942

chenyucheng0221 · 2025-03-05T03:55:42Z

Checklist

1. I have searched related issues but cannot get the expected help.
2. The bug has not been fixed in the latest version.
3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.

Describe the bug

Hi,

When I run Mini-InternVL2-DA-Medical-4B and 2B model, I will hit the following issue, seems like cannot find the vocab file. could you kindly help have a look? Thanks!

Reproduction

I copy script in this website https://huggingface.co/OpenGVLab/Mini-InternVL2-4B-DA-Medical and run

Environment

pip install -r requirements.txt provided by the github repo.

Error traceback

Traceback (most recent call last):
  File "/home/yucheng/code/InternVL/test_4B_DA_Medical.py", line 91, in <module>
    tokenizer = LlamaTokenizer.from_pretrained(path, trust_remote_code=True, use_fast=False)
  File "/home/yucheng/miniconda/envs/internvl/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 2029, in from_pretrained
    return cls._from_pretrained(
  File "/home/yucheng/miniconda/envs/internvl/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 2261, in _from_pretrained
    tokenizer = cls(*init_inputs, **init_kwargs)
  File "/home/yucheng/miniconda/envs/internvl/lib/python3.9/site-packages/transformers/models/llama/tokenization_llama.py", line 178, in __init__
    self.sp_model = self.get_spm_processor(kwargs.pop("from_slow", False))
  File "/home/yucheng/miniconda/envs/internvl/lib/python3.9/site-packages/transformers/models/llama/tokenization_llama.py", line 206, in get_spm_processor
    with open(self.vocab_file, "rb") as f:
TypeError: expected str, bytes or os.PathLike object, not NoneType

yuecao0119 · 2025-03-06T07:27:32Z

Hi,

Please try

tokenizer = AutoTokenizer.from_pretrained(path, trust_remote_code=True, use_fast=False)

chenyucheng0221 · 2025-03-06T15:00:25Z

Hi,

Please try

tokenizer = AutoTokenizer.from_pretrained(path, trust_remote_code=True, use_fast=False)

Hi, in the script for the medical-4B model, this line in the script is the same with one you provided. I hit the issue as well.

If you want to load a model using multiple GPUs, please refer to the `Multiple GPUs` section.

path = 'OpenGVLab/Mini-InternVL2-4B-DA-Medical'
model = AutoModel.from_pretrained(
path,
torch_dtype=torch.bfloat16,
low_cpu_mem_usage=True,
use_flash_attn=True,
trust_remote_code=True).eval().cuda()
tokenizer = AutoTokenizer.from_pretrained(path, trust_remote_code=True, use_fast=False)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] TypeError: expected str, bytes or os.PathLike object, not NoneType #942

[Bug] TypeError: expected str, bytes or os.PathLike object, not NoneType #942

chenyucheng0221 commented Mar 5, 2025 •

edited

Loading

yuecao0119 commented Mar 6, 2025

chenyucheng0221 commented Mar 6, 2025

[Bug] TypeError: expected str, bytes or os.PathLike object, not NoneType #942

[Bug] TypeError: expected str, bytes or os.PathLike object, not NoneType #942

Comments

chenyucheng0221 commented Mar 5, 2025 • edited Loading

Checklist

Describe the bug

Reproduction

Environment

Error traceback

yuecao0119 commented Mar 6, 2025

chenyucheng0221 commented Mar 6, 2025

If you want to load a model using multiple GPUs, please refer to the Multiple GPUs section.

chenyucheng0221 commented Mar 5, 2025 •

edited

Loading

If you want to load a model using multiple GPUs, please refer to the `Multiple GPUs` section.