You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have observed a recent change in LinearWithGradAccumulationAndAsyncCommunication to store the gradient of weights in WeightGradStore as a part of the new Zero Bubble Pipeline Parallelism feature (#396):
I have observed a recent change in
LinearWithGradAccumulationAndAsyncCommunication
to store the gradient of weights inWeightGradStore
as a part of the new Zero Bubble Pipeline Parallelism feature (#396):https://github.com/microsoft/Megatron-DeepSpeed/blob/1280f59c1a65e50d4e174e4195e14f173301a497/megatron/core/tensor_parallel/layers.py#L370
However, the stored gradients are only accessed in
deepspeed_zbh1_engine
:https://github.com/microsoft/Megatron-DeepSpeed/blob/1280f59c1a65e50d4e174e4195e14f173301a497/megatron/core/pipeline_parallel/deepspeed_zbh1_engine.py#L108
If the Zero Bubble Pipeline Parallelism feature is not enabled, it seems that the gradients are not being returned. Is this an expected behavior?
The text was updated successfully, but these errors were encountered: