Per-epoch loss computation over just the epoch instead of snapshot of total average #79

largestack · 2025-02-10T05:59:15Z

Quick request if possible. loss/epoch seems to be calculated as the total average loss since the start of training. Is it possible to upgrade this to record the average step loss of the epoch instead? I think this will better-guide everyone in how their latest model is performing and better matches what people expect this value to contain.

Eg, something like:

for epoch in range(epoch_to_start, num_train_epochs):
    epoch_losses = []  # Reset each epoch
    for step, batch in enumerate(train_dataloader):
        # ... training loop ...
        current_loss = loss.detach().item()
        epoch_losses.append(current_loss)
        # ... other code ...

    # After steps in epoch, calculate epoch loss
    avg_epoch_loss = sum(epoch_losses) / len(epoch_losses) if len(epoch_losses) > 0 else 0.0
    if len(accelerator.trackers) > 0:
        logs = {"loss/epoch": avg_epoch_loss}
        accelerator.log(logs, step=epoch + 1)

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Per-epoch loss computation over just the epoch instead of snapshot of total average #79

Per-epoch loss computation over just the epoch instead of snapshot of total average #79

largestack commented Feb 10, 2025 •

edited

Loading

Per-epoch loss computation over just the epoch instead of snapshot of total average #79

Per-epoch loss computation over just the epoch instead of snapshot of total average #79

Comments

largestack commented Feb 10, 2025 • edited Loading

largestack commented Feb 10, 2025 •

edited

Loading