[docs] fix bugs in the bitsandbytes documentation #35868

faaany · 2025-01-24T09:35:42Z

What does this PR do?

When running the example codes on XPU, I got the following 3 errors:

Traceback (most recent call last):
  File "/home/sdp/fanli/doc_to_fix.py", line 5, in <module>
    model = AutoModelForCausalLM.from_pretrained(model_id, BitsAndBytesConfig(load_in_4bit=True))
  File "/home/sdp/fanli/transformers/src/transformers/models/auto/auto_factory.py", line 564, in from_pretrained
    return model_class.from_pretrained(
  File "/home/sdp/fanli/transformers/src/transformers/modeling_utils.py", line 4129, in from_pretrained
    model = cls(config, *model_args, **model_kwargs)
TypeError: OPTForCausalLM.__init__() takes 2 positional arguments but 3 were given

Traceback (most recent call last):
  File "/home/sdp/fanli/doc_to_fix.py", line 5, in <module>
    quantization_config = BitsAndBytesConfig(
  File "/home/sdp/fanli/transformers/src/transformers/utils/quantization_config.py", line 431, in __init__
    self.post_init()
  File "/home/sdp/fanli/transformers/src/transformers/utils/quantization_config.py", line 470, in post_init
    raise TypeError("llm_int8_threshold must be a float")
TypeError: llm_int8_threshold must be a float

Traceback (most recent call last):
  File "/home/sdp/fanli/doc_to_fix.py", line 18, in <module>
    model_8bit = AutoModelForCausalLM.from_pretrained(
  File "/home/sdp/fanli/transformers/src/transformers/models/auto/auto_factory.py", line 564, in from_pretrained
    return model_class.from_pretrained(
  File "/home/sdp/fanli/transformers/src/transformers/modeling_utils.py", line 3633, in from_pretrained
    hf_quantizer.validate_environment(
  File "/home/sdp/fanli/transformers/src/transformers/quantizers/quantizer_bnb_4bit.py", line 103, in validate_environment
    raise ValueError(
ValueError: Some modules are dispatched on the CPU or the disk. Make sure you have enough GPU RAM to fit the quantized model. If you want to dispatch the model on the CPU or the disk while keeping these modules in 32-bit, you need to set `llm_int8_enable_fp32_cpu_offload=True` and pass a custom `device_map` to `from_pretrained`. Check https://huggingface.co/docs/transformers/main/en/main_classes/quantization#offload-between-cpu-and-gpu for more details.

And I also found that the below code returns error that no appropriate model files exit in the repo. Not sure whether we should replace it with meta-llama/Llama-2-13b-chat-hf, so I didn't make the change in this PR.

model_double_quant = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-13b", torch_dtype="auto", quantization_config=double_quant_config)

If we should update it as well, just let me know.

cc: @stevhliu

stevhliu

Yeah, feel free to update to the meta-llama/Llama-2-13b-chat-hf checkpoint!

fix doc

d1a0eb5

stevhliu approved these changes Jan 24, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[docs] fix bugs in the bitsandbytes documentation #35868

[docs] fix bugs in the bitsandbytes documentation #35868

faaany commented Jan 24, 2025

stevhliu left a comment

[docs] fix bugs in the bitsandbytes documentation #35868

Are you sure you want to change the base?

[docs] fix bugs in the bitsandbytes documentation #35868

Conversation

faaany commented Jan 24, 2025

What does this PR do?

stevhliu left a comment

Choose a reason for hiding this comment