-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
llama2-7b-chat量化完推理报错 #24
Comments
可能是transformers的版本问题。 建议使用4.36.2版本试试。 或者修改llama.py代码来适配你的transforemrs库 |
Some weights of the model checkpoint at /data/AutoSmoothQuant/quantized_model/llama2-7b-chat/Llama-2-7b-chat-hf-smoothquant were not used when initializing Int8LlamaForCausalLM: ['model.layers.30.mlp.down_proj.quant_scale', 'model.layers.6.self_attn.o_proj.quant_scale', 'model.layers.5.self_attn.o_proj.quant_scale', 'model.layers.12.mlp.down_proj.quant_scale', 'model.layers.29.mlp.down_proj.quant_scale', 'model.layers.19.self_attn.o_proj.quant_scale', 'model.layers.26.self_attn.o_proj.quant_scale', 'model.layers.3.self_attn.o_proj.quant_scale', 'model.layers.10.mlp.down_proj.quant_scale', 'model.layers.23.self_attn.o_proj.quant_scale', 'model.layers.8.mlp.down_proj.quant_scale', 'model.layers.4.mlp.down_proj.quant_scale', 'model.layers.5.mlp.down_proj.quant_scale', 'model.layers.24.self_attn.o_proj.quant_scale', 'model.layers.27.mlp.down_proj.quant_scale', 'model.layers.18.mlp.down_proj.quant_scale', 'model.layers.29.self_attn.o_proj.quant_scale', 'model.layers.28.self_attn.o_proj.quant_scale', 'model.layers.30.self_attn.o_proj.quant_scale', 'model.layers.15.mlp.down_proj.quant_scale', 'model.layers.20.self_attn.o_proj.quant_scale', 'model.layers.21.mlp.down_proj.quant_scale', 'model.layers.9.self_attn.o_proj.quant_scale', 'model.layers.22.mlp.down_proj.quant_scale', 'model.layers.10.self_attn.o_proj.quant_scale', 'model.layers.28.mlp.down_proj.quant_scale', 'model.layers.23.mlp.down_proj.quant_scale', 'model.layers.25.mlp.down_proj.quant_scale', 'model.layers.14.mlp.down_proj.quant_scale', 'model.layers.9.mlp.down_proj.quant_scale', 'model.layers.7.self_attn.o_proj.quant_scale', 'model.layers.27.self_attn.o_proj.quant_scale', 'model.layers.16.mlp.down_proj.quant_scale', 'model.layers.1.self_attn.o_proj.quant_scale', 'model.layers.14.self_attn.o_proj.quant_scale', 'model.layers.31.self_attn.o_proj.quant_scale', 'model.layers.16.self_attn.o_proj.quant_scale', 'model.layers.11.mlp.down_proj.quant_scale', 'model.layers.20.mlp.down_proj.quant_scale', 'model.layers.2.self_attn.o_proj.quant_scale', 'model.layers.24.mlp.down_proj.quant_scale', 'model.layers.18.self_attn.o_proj.quant_scale', 'model.layers.8.self_attn.o_proj.quant_scale', 'model.layers.26.mlp.down_proj.quant_scale', 'model.layers.17.self_attn.o_proj.quant_scale', 'model.layers.17.mlp.down_proj.quant_scale', 'model.layers.2.mlp.down_proj.quant_scale', 'model.layers.22.self_attn.o_proj.quant_scale', 'model.layers.6.mlp.down_proj.quant_scale', 'model.layers.0.mlp.down_proj.quant_scale', 'model.layers.13.self_attn.o_proj.quant_scale', 'model.layers.4.self_attn.o_proj.quant_scale', 'model.layers.11.self_attn.o_proj.quant_scale', 'model.layers.0.self_attn.o_proj.quant_scale', 'model.layers.1.mlp.down_proj.quant_scale', 'model.layers.21.self_attn.o_proj.quant_scale', 'model.layers.7.mlp.down_proj.quant_scale', 'model.layers.12.self_attn.o_proj.quant_scale', 'model.layers.3.mlp.down_proj.quant_scale', 'model.layers.19.mlp.down_proj.quant_scale', 'model.layers.25.self_attn.o_proj.quant_scale', 'model.layers.31.mlp.down_proj.quant_scale', 'model.layers.13.mlp.down_proj.quant_scale', 'model.layers.15.self_attn.o_proj.quant_scale']
|
问1+1=?回答是这样: |
|
:)hi 大佬 |
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:42<00:00, 21.33s/it]
Some weights of the model checkpoint at /data/AutoSmoothQuant/quantized_model/llama2-7b-chat/Llama-2-7b-chat-hf-smoothquant were not used when initializing Int8LlamaForCausalLM: ['model.layers.0.mlp.down_proj.quant_scale', 'model.layers.0.self_attn.o_proj.quant_scale', 'model.layers.1.mlp.down_proj.quant_scale', 'model.layers.1.self_attn.o_proj.quant_scale', 'model.layers.10.mlp.down_proj.quant_scale', 'model.layers.10.self_attn.o_proj.quant_scale', 'model.layers.11.mlp.down_proj.quant_scale', 'model.layers.11.self_attn.o_proj.quant_scale', 'model.layers.12.mlp.down_proj.quant_scale', 'model.layers.12.self_attn.o_proj.quant_scale', 'model.layers.13.mlp.down_proj.quant_scale', 'model.layers.13.self_attn.o_proj.quant_scale', 'model.layers.14.mlp.down_proj.quant_scale', 'model.layers.14.self_attn.o_proj.quant_scale', 'model.layers.15.mlp.down_proj.quant_scale', 'model.layers.15.self_attn.o_proj.quant_scale', 'model.layers.16.mlp.down_proj.quant_scale', 'model.layers.16.self_attn.o_proj.quant_scale', 'model.layers.17.mlp.down_proj.quant_scale', 'model.layers.17.self_attn.o_proj.quant_scale', 'model.layers.18.mlp.down_proj.quant_scale', 'model.layers.18.self_attn.o_proj.quant_scale', 'model.layers.19.mlp.down_proj.quant_scale', 'model.layers.19.self_attn.o_proj.quant_scale', 'model.layers.2.mlp.down_proj.quant_scale', 'model.layers.2.self_attn.o_proj.quant_scale', 'model.layers.20.mlp.down_proj.quant_scale', 'model.layers.20.self_attn.o_proj.quant_scale', 'model.layers.21.mlp.down_proj.quant_scale', 'model.layers.21.self_attn.o_proj.quant_scale', 'model.layers.22.mlp.down_proj.quant_scale', 'model.layers.22.self_attn.o_proj.quant_scale', 'model.layers.23.mlp.down_proj.quant_scale', 'model.layers.23.self_attn.o_proj.quant_scale', 'model.layers.24.mlp.down_proj.quant_scale', 'model.layers.24.self_attn.o_proj.quant_scale', 'model.layers.25.mlp.down_proj.quant_scale', 'model.layers.25.self_attn.o_proj.quant_scale', 'model.layers.26.mlp.down_proj.quant_scale', 'model.layers.26.self_attn.o_proj.quant_scale', 'model.layers.27.mlp.down_proj.quant_scale', 'model.layers.27.self_attn.o_proj.quant_scale', 'model.layers.28.mlp.down_proj.quant_scale', 'model.layers.28.self_attn.o_proj.quant_scale', 'model.layers.29.mlp.down_proj.quant_scale', 'model.layers.29.self_attn.o_proj.quant_scale', 'model.layers.3.mlp.down_proj.quant_scale', 'model.layers.3.self_attn.o_proj.quant_scale', 'model.layers.30.mlp.down_proj.quant_scale', 'model.layers.30.self_attn.o_proj.quant_scale', 'model.layers.31.mlp.down_proj.quant_scale', 'model.layers.31.self_attn.o_proj.quant_scale', 'model.layers.4.mlp.down_proj.quant_scale', 'model.layers.4.self_attn.o_proj.quant_scale', 'model.layers.5.mlp.down_proj.quant_scale', 'model.layers.5.self_attn.o_proj.quant_scale', 'model.layers.6.mlp.down_proj.quant_scale', 'model.layers.6.self_attn.o_proj.quant_scale', 'model.layers.7.mlp.down_proj.quant_scale', 'model.layers.7.self_attn.o_proj.quant_scale', 'model.layers.8.mlp.down_proj.quant_scale', 'model.layers.8.self_attn.o_proj.quant_scale', 'model.layers.9.mlp.down_proj.quant_scale', 'model.layers.9.self_attn.o_proj.quant_scale']
Traceback (most recent call last):
File "/data/AutoSmoothQuant/autosmoothquant/examples/test_model.py", line 60, in
main()
File "/data/conda_env/base/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/data/AutoSmoothQuant/autosmoothquant/examples/test_model.py", line 54, in main
output_ids = model.generate(**inputs, max_new_tokens=20)
File "/data/conda_env/base/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/data/conda_env/base/lib/python3.9/site-packages/transformers/generation/utils.py", line 1527, in generate
result = self._greedy_search(
File "/data/conda_env/base/lib/python3.9/site-packages/transformers/generation/utils.py", line 2411, in _greedy_search
outputs = self(
File "/data/conda_env/base/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/data/conda_env/base/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 1196, in forward
outputs = self.model(
File "/data/conda_env/base/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/data/conda_env/base/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 990, in forward
causal_mask = self._update_causal_mask(attention_mask, inputs_embeds, cache_position)
File "/data/conda_env/base/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1614, in getattr
raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'Int8LlamaModel' object has no attribute '_update_causal_mask'
推理时报错时什么原因呢?该如何解决?
The text was updated successfully, but these errors were encountered: