-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
如何将2.0-51B-hf这个HuggingFace版的模型进行pp和tp转换 #113
Comments
hf版的模型实现方式具有通用性所以我们没有专门提供。 |
@18842685792 对于hf版本的模型,是希望我们提供一个转换好的模型,还是提供一个转换脚本? |
因为没有专门学习过模型训练转换方面的知识,只是看这两个文档感觉无从下手,所以是希望提供一个基于CPU的转换脚本 |
这个进行张量和流水转换后推理速度能提升多少? |
|
51B-hf开启张量并行不需要模型转换 import transformers
import tensor_parallel as tp
tokenizer = transformers.AutoTokenizer.from_pretrained("facebook/opt-13b")
model = transformers.AutoModelForCausalLM.from_pretrained("facebook/opt-13b") # use opt-125m for testing
model = tp.tensor_parallel(model, ["cuda:0", "cuda:1"]) # <- each GPU has half the weights
inputs = tokenizer("A cat sat", return_tensors="pt")["input_ids"].to("cuda:0")
outputs = model.generate(inputs, num_beams=5)
print(tokenizer.decode(outputs[0])) # A cat sat on my lap for a few minutes ...
model(input_ids=inputs, labels=inputs).loss.backward() # training works as usual 参考以上代码即可。 |
在tensor_parallel的issue下给出了llama-7B和opt的2卡性能加速效果,可以作为参考。 |
github上提供了原版51B的转换方式,未提供2.0-51B-hf这个版本的转换方式
The text was updated successfully, but these errors were encountered: