Saving `raw_model.state_dict()` checkpoints #90

anw-g01 · 2025-01-28T14:45:10Z

The following line defining the raw_model is only called once at the start before the training loop begins:

raw_model = model.module if ddp else model # always contains the "raw" unwrapped model

If I'm not mistaken, doesn't this create a small bug because raw_model is never trained in the loop? As only model, which is either the normal model or a DDP() wrapped model, is the model instance that is trained? Unless raw_model is also updated during DDP?

If so with this logic, it seems that saving the model checkpoints is currently redundant, as only raw_model.state_dict() is being saved every time, which is static.

checkpoint = {
                    'model': raw_model.state_dict(),
                    'config': raw_model.config,
                    ...
                    ...
                }

As a suggestion would it be more correct to use:

checkpoint = {
                    'model': (model.module if ddp else model).state_dict(),
                    ...
                    ...
                    ...
                }

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Saving `raw_model.state_dict()` checkpoints #90

Saving `raw_model.state_dict()` checkpoints #90

anw-g01 commented Jan 28, 2025 •

edited

Loading

Saving raw_model.state_dict() checkpoints #90

Saving raw_model.state_dict() checkpoints #90

Comments

anw-g01 commented Jan 28, 2025 • edited Loading

Saving `raw_model.state_dict()` checkpoints #90

Saving `raw_model.state_dict()` checkpoints #90

anw-g01 commented Jan 28, 2025 •

edited

Loading