Max batches float(inf) handled incorrectly #20565

dannyfriar · 2025-01-29T13:30:08Z

Bug description

When using a dataloader which doesn't have __len__ implemented, lightning adds a max_batches as float("inf") here which then breaks further on.

What version are you seeing the problem on?

v2.5

How to reproduce the bug

Struggling to provide a simple repro but it happens when loading a checkpoint i.e. any time we have self.resetting as True in the eval loop.

Error messages and logs

    trainer.fit(
   File "/venv/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 539, in fit
    call._call_and_handle_interrupt(
   File "/venv/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py", line 47, in _call_and_handle_interrupt
    return trainer_fn(*args, **kwargs)
   File "/venv/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 575, in _fit_impl
    self._run(model, ckpt_path=ckpt_path)
   File "/venv/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 982, in _run
    results = self._run_stage()
   File "/venv/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 1026, in _run_stage
    self.fit_loop.run()
   File "/venv/lib/python3.10/site-packages/lightning/pytorch/loops/fit_loop.py", line 216, in run
    self.advance()
   File "/venv/lib/python3.10/site-packages/lightning/pytorch/loops/fit_loop.py", line 455, in advance
    self.epoch_loop.run(self._data_fetcher)
   File "/venv/lib/python3.10/site-packages/lightning/pytorch/loops/training_epoch_loop.py", line 150, in run
    self.advance(data_fetcher)
   File "/venv/lib/python3.10/site-packages/lightning/pytorch/loops/training_epoch_loop.py", line 270, in advance
    self.val_loop.increment_progress_to_evaluation_end()
   File "/venv/lib/python3.10/site-packages/lightning/pytorch/loops/evaluation_loop.py", line 271, in increment_progress_to_evaluation_end
    max_batch = int(max(self.max_batches))
 OverflowError: cannot convert float infinity to integer

Environment

Current environment

#- PyTorch Lightning Version (e.g., 2.5.0): 2.5.0
#- PyTorch Version (e.g., 2.5): 2.5
#- Python version (e.g., 3.12): 3.10
#- OS (e.g., Linux): Ubuntu
#- CUDA/cuDNN version: CUDA12, cuDNN9
#- GPU models and configuration: A100
#- How you installed Lightning(`conda`, `pip`, source): pip

More info

No response

The text was updated successfully, but these errors were encountered:

dannyfriar added bug Something isn't working needs triage Waiting to be triaged by maintainers labels Jan 29, 2025

github-actions bot added the ver: 2.5.x label Jan 29, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Max batches float(inf) handled incorrectly #20565

Max batches float(inf) handled incorrectly #20565

dannyfriar commented Jan 29, 2025 •

edited

Loading

Max batches float(inf) handled incorrectly #20565

Max batches float(inf) handled incorrectly #20565

Comments

dannyfriar commented Jan 29, 2025 • edited Loading

Bug description

What version are you seeing the problem on?

How to reproduce the bug

Error messages and logs

Environment

More info

dannyfriar commented Jan 29, 2025 •

edited

Loading