Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[docs] fix bugs in the bitsandbytes documentation #35868

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

faaany
Copy link
Contributor

@faaany faaany commented Jan 24, 2025

What does this PR do?

When running the example codes on XPU, I got the following 3 errors:

Traceback (most recent call last):
  File "/home/sdp/fanli/doc_to_fix.py", line 5, in <module>
    model = AutoModelForCausalLM.from_pretrained(model_id, BitsAndBytesConfig(load_in_4bit=True))
  File "/home/sdp/fanli/transformers/src/transformers/models/auto/auto_factory.py", line 564, in from_pretrained
    return model_class.from_pretrained(
  File "/home/sdp/fanli/transformers/src/transformers/modeling_utils.py", line 4129, in from_pretrained
    model = cls(config, *model_args, **model_kwargs)
TypeError: OPTForCausalLM.__init__() takes 2 positional arguments but 3 were given
Traceback (most recent call last):
  File "/home/sdp/fanli/doc_to_fix.py", line 5, in <module>
    quantization_config = BitsAndBytesConfig(
  File "/home/sdp/fanli/transformers/src/transformers/utils/quantization_config.py", line 431, in __init__
    self.post_init()
  File "/home/sdp/fanli/transformers/src/transformers/utils/quantization_config.py", line 470, in post_init
    raise TypeError("llm_int8_threshold must be a float")
TypeError: llm_int8_threshold must be a float
Traceback (most recent call last):
  File "/home/sdp/fanli/doc_to_fix.py", line 18, in <module>
    model_8bit = AutoModelForCausalLM.from_pretrained(
  File "/home/sdp/fanli/transformers/src/transformers/models/auto/auto_factory.py", line 564, in from_pretrained
    return model_class.from_pretrained(
  File "/home/sdp/fanli/transformers/src/transformers/modeling_utils.py", line 3633, in from_pretrained
    hf_quantizer.validate_environment(
  File "/home/sdp/fanli/transformers/src/transformers/quantizers/quantizer_bnb_4bit.py", line 103, in validate_environment
    raise ValueError(
ValueError: Some modules are dispatched on the CPU or the disk. Make sure you have enough GPU RAM to fit the quantized model. If you want to dispatch the model on the CPU or the disk while keeping these modules in 32-bit, you need to set `llm_int8_enable_fp32_cpu_offload=True` and pass a custom `device_map` to `from_pretrained`. Check https://huggingface.co/docs/transformers/main/en/main_classes/quantization#offload-between-cpu-and-gpu for more details. 

And I also found that the below code returns error that no appropriate model files exit in the repo. Not sure whether we should replace it with meta-llama/Llama-2-13b-chat-hf, so I didn't make the change in this PR.

model_double_quant = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-13b", torch_dtype="auto", quantization_config=double_quant_config)

If we should update it as well, just let me know.

cc: @stevhliu

Copy link
Member

@stevhliu stevhliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, feel free to update to the meta-llama/Llama-2-13b-chat-hf checkpoint!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants