You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The issue could be related to #116. I am adapting the Prototyping Networks for my case. I noticed that you're using an adjusted version of ResNet in your examples. Based on my experiments, this version is not a direct replacement for standard torch ResNet implementation. At least it consumes more GPU memory in the same circumstances.
{
"name": "OutOfMemoryError",
"message": "CUDA out of memory. Tried to allocate 20.00 MiB. GPU 0 has a total capacty of 7.79 GiB of which 6.31 MiB is free. Process 290479 has 6.22 GiB memory in use. Including non-PyTorch memory, this process has 1.55 GiB memory in use. Of the allocated memory 1.36 GiB is allocated by PyTorch, and 52.65 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF",
"stack": "---------------------------------------------------------------------------
OutOfMemoryError Traceback (most recent call last)
Cell In[1], line 29
27 model_output = model(images.to(DEVICE))
28 loss = LOSS_FUNCTION(model_output, labels.to(DEVICE))
---> 29 loss.backward()
31 optimizer.step()
File ~/.pyenv/versions/3.11.6/envs/satellite-dataset/lib/python3.11/site-packages/torch/_tensor.py:492, in Tensor.backward(self, gradient, retain_graph, create_graph, inputs)
482 if has_torch_function_unary(self):
483 return handle_torch_function(
484 Tensor.backward,
485 (self,),
(...)
490 inputs=inputs,
491 )
--> 492 torch.autograd.backward(
493 self, gradient, retain_graph, create_graph, inputs=inputs
494 )
File ~/.pyenv/versions/3.11.6/envs/satellite-dataset/lib/python3.11/site-packages/torch/autograd/__init__.py:251, in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables, inputs)
246 retain_graph = create_graph
248 # The reason we repeat the same comment below is that
249 # some Python versions print out the first line of a multi-line function
250 # calls in the traceback and some print out the last line
--> 251 Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
252 tensors,
253 grad_tensors_,
254 retain_graph,
255 create_graph,
256 inputs,
257 allow_unreachable=True,
258 accumulate_grad=True,
259 )
OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB. GPU 0 has a total capacty of 7.79 GiB of which 6.31 MiB is free. Process 290479 has 6.22 GiB memory in use. Including non-PyTorch memory, this process has 1.55 GiB memory in use. Of the allocated memory 1.36 GiB is allocated by PyTorch, and 52.65 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF"
}
The same code with SIZE=5 or with "native" ResNet Module doesn't cause issue:
Custom ResNets in easyfsl come from this implementation which is now a not-so-recent fork of PyTorch's ResNet, and it is highly possible that its memory usage is suboptimal.
Quick response to this problem would be to make this clear this in our custom ResNet's docstring.
Better response would be to start a deep study of the differences between this implementation and PyTorch's and find how to improve our memory usage.
Best response would probably be to reimplement our custom ResNet to extend PyTorch's and reduce to the minimum the differences between the two.
Note that the last two options could cause unexpected shifts between the results obtained with easyfsl and the other works that use FiveAI's implementation, but it's probably worth it: easyfsl is meant to improve best practices in the field.
Thank you. I'll discover both architectures and come up with improvements if I have such. For now, I've switched to the native PyTorch implementation; it eats less memory and is way faster, at least for my case.
Description
The issue could be related to #116. I am adapting the Prototyping Networks for my case. I noticed that you're using an adjusted version of ResNet in your examples. Based on my experiments, this version is not a direct replacement for standard torch ResNet implementation. At least it consumes more GPU memory in the same circumstances.
How To Reproduce
Output:
The same code with SIZE=5 or with "native" ResNet Module doesn't cause issue:
Additional context
The code above was executed on
I tried Google Colab, and with T4 16Gb, I reached batch sizes of 64 and 512 with EastFSL and Torch version of ResNet18.
I am a very beginner in ML and may misuse the framework.
The text was updated successfully, but these errors were encountered: