You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)
Reproduction
I used the transformer's Trainer to train the model, but used my own Dataloader.
Then I used pytorch profile to check my training performance and found that the CPU execution time accounted for a high proportion
System Info
No Need
Who can help?
No response
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
I used the transformer's Trainer to train the model, but used my own Dataloader.
Then I used pytorch profile to check my training performance and found that the CPU execution time accounted for a high proportion
After a period of investigation, it was found that the non_blocking was not set when the data was transferred from the CPU to the GPU.
https://github.com/huggingface/transformers/blob/v4.49.0/src/transformers/trainer.py#L3625-L3631
The modified code is:
Then I re-profiled my code and the results were as follows:
You can see that the performance has been greatly improved.
I'm not sure if this is a bug in the code or a problem with the way I'm using it.
But there is no doubt that setting non_blocking=True has brought a great performance improvement to my training.
Looking forward to your reply
Expected behavior
No Need
The text was updated successfully, but these errors were encountered: