You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When using a dataloader which doesn't have __len__ implemented, lightning adds a max_batches as float("inf")here which then breaks further on.
What version are you seeing the problem on?
v2.5
How to reproduce the bug
Struggling to provide a simple repro but it happens when loading a checkpoint i.e. any time we have self.resetting as True in the eval loop.
Error messages and logs
trainer.fit(
File "/venv/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 539, in fit
call._call_and_handle_interrupt(
File "/venv/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py", line 47, in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
File "/venv/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 575, in _fit_impl
self._run(model, ckpt_path=ckpt_path)
File "/venv/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 982, in _run
results = self._run_stage()
File "/venv/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 1026, in _run_stage
self.fit_loop.run()
File "/venv/lib/python3.10/site-packages/lightning/pytorch/loops/fit_loop.py", line 216, in run
self.advance()
File "/venv/lib/python3.10/site-packages/lightning/pytorch/loops/fit_loop.py", line 455, in advance
self.epoch_loop.run(self._data_fetcher)
File "/venv/lib/python3.10/site-packages/lightning/pytorch/loops/training_epoch_loop.py", line 150, in run
self.advance(data_fetcher)
File "/venv/lib/python3.10/site-packages/lightning/pytorch/loops/training_epoch_loop.py", line 270, in advance
self.val_loop.increment_progress_to_evaluation_end()
File "/venv/lib/python3.10/site-packages/lightning/pytorch/loops/evaluation_loop.py", line 271, in increment_progress_to_evaluation_end
max_batch = int(max(self.max_batches))
OverflowError: cannot convert float infinity to integer
Environment
Current environment
#- PyTorch Lightning Version (e.g., 2.5.0): 2.5.0
#- PyTorch Version (e.g., 2.5): 2.5
#- Python version (e.g., 3.12): 3.10
#- OS (e.g., Linux): Ubuntu
#- CUDA/cuDNN version: CUDA12, cuDNN9
#- GPU models and configuration: A100
#- How you installed Lightning(`conda`, `pip`, source): pip
More info
No response
The text was updated successfully, but these errors were encountered:
Bug description
When using a dataloader which doesn't have
__len__
implemented, lightning adds amax_batches
asfloat("inf")
here which then breaks further on.What version are you seeing the problem on?
v2.5
How to reproduce the bug
Struggling to provide a simple repro but it happens when loading a checkpoint i.e. any time we have
self.resetting
asTrue
in the eval loop.Error messages and logs
Environment
Current environment
More info
No response
The text was updated successfully, but these errors were encountered: