You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Have you ever encountered the problem that the card owner always enters the hang state during training? Sometimes it is after inputting a fixed data , and sometimes when it is directly reading the video? But the GPU memory is obviously only half occupied?
The text was updated successfully, but these errors were encountered:
I haven't got that issue when running in my env. It could be somekind of mismatch with cuda/deepspeed/pytorch.
Does that happens the same when using the Zero optimization with stage 2?
Have you ever encountered the problem that the card owner always enters the hang state during training? Sometimes it is after inputting a fixed data , and sometimes when it is directly reading the video? But the GPU memory is obviously only half occupied?
The text was updated successfully, but these errors were encountered: