Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] TypeError: expected str, bytes or os.PathLike object, not NoneType #942

Open
2 of 3 tasks
chenyucheng0221 opened this issue Mar 5, 2025 · 2 comments
Open
2 of 3 tasks

Comments

@chenyucheng0221
Copy link

chenyucheng0221 commented Mar 5, 2025

Checklist

  • 1. I have searched related issues but cannot get the expected help.
  • 2. The bug has not been fixed in the latest version.
  • 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.

Describe the bug

Hi,

When I run Mini-InternVL2-DA-Medical-4B and 2B model, I will hit the following issue, seems like cannot find the vocab file. could you kindly help have a look? Thanks!

Reproduction

I copy script in this website https://huggingface.co/OpenGVLab/Mini-InternVL2-4B-DA-Medical and run

Environment

pip install -r requirements.txt provided by the github repo.

Error traceback

Traceback (most recent call last):
  File "/home/yucheng/code/InternVL/test_4B_DA_Medical.py", line 91, in <module>
    tokenizer = LlamaTokenizer.from_pretrained(path, trust_remote_code=True, use_fast=False)
  File "/home/yucheng/miniconda/envs/internvl/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 2029, in from_pretrained
    return cls._from_pretrained(
  File "/home/yucheng/miniconda/envs/internvl/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 2261, in _from_pretrained
    tokenizer = cls(*init_inputs, **init_kwargs)
  File "/home/yucheng/miniconda/envs/internvl/lib/python3.9/site-packages/transformers/models/llama/tokenization_llama.py", line 178, in __init__
    self.sp_model = self.get_spm_processor(kwargs.pop("from_slow", False))
  File "/home/yucheng/miniconda/envs/internvl/lib/python3.9/site-packages/transformers/models/llama/tokenization_llama.py", line 206, in get_spm_processor
    with open(self.vocab_file, "rb") as f:
TypeError: expected str, bytes or os.PathLike object, not NoneType
@yuecao0119
Copy link
Collaborator

Hi,

Please try

tokenizer = AutoTokenizer.from_pretrained(path, trust_remote_code=True, use_fast=False)

@chenyucheng0221
Copy link
Author

Hi,

Please try

tokenizer = AutoTokenizer.from_pretrained(path, trust_remote_code=True, use_fast=False)

Hi, in the script for the medical-4B model, this line in the script is the same with one you provided. I hit the issue as well.

If you want to load a model using multiple GPUs, please refer to the Multiple GPUs section.

path = 'OpenGVLab/Mini-InternVL2-4B-DA-Medical'
model = AutoModel.from_pretrained(
path,
torch_dtype=torch.bfloat16,
low_cpu_mem_usage=True,
use_flash_attn=True,
trust_remote_code=True).eval().cuda()
tokenizer = AutoTokenizer.from_pretrained(path, trust_remote_code=True, use_fast=False)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants