Controlling the ASE Calculator #290

tgmaxson · 2023-01-11T18:57:32Z

tgmaxson
Jan 11, 2023

I am training and loading models using the ASE calculator many times during the course of the work I am doing and it seems that this effectively leaks memory on the GPU since the torch script is somehow saved on it (even when python loses its reference to the ASE calculator). I am currently using the "NequIPCalculator.from_deployed_model(model_filename, device=device)" function to load models and I think I need to do one of two things but am unsure how.

Replace the model in the calculator with a new model. Is it possible to simply overwrite an existing calculator rather than reading and creating a new one? The old ones must still be around somehow.
Clear the Torchscript from the calculator / empty the GPU cache before losing the reference to the calculator.

Also as a final sort of related question, I will be able to work with multiple GPUs in the near future. Does the 'device="cuda"' directive pass directly to Pytorch? I am wondering if "cuda:0" and "cuda:1" options will be supported.

Linux-cpp-lisp · 2023-01-12T21:30:25Z

Linux-cpp-lisp
Jan 12, 2023
Maintainer

Hi @tgmaxson ,

This sounds related to the PyTorch caching allocator, which often keeps memory allocated on the GPU (from the system / driver's perspective) when it is really sitting unused from the Python code's perspective. This is to speed up repeated allocations of similar sized buffers, which is the defining memory pattern of most ML--- if you look this up there is a lot of discussion on PyTorch forums about how to manage, understand, and clear this cache. I don't think there's anything to do here on the NequIP code side, since we rely entirely on PyTorch for memory management.

Does the 'device="cuda"' directive pass directly to Pytorch? I am wondering if "cuda:0" and "cuda:1" options will be supported.

Yes, this should work as expected. If not, please file an issue.

Thanks!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Controlling the ASE Calculator #290

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Controlling the ASE Calculator #290

tgmaxson Jan 11, 2023

Replies: 1 comment

Linux-cpp-lisp Jan 12, 2023 Maintainer

tgmaxson
Jan 11, 2023

Linux-cpp-lisp
Jan 12, 2023
Maintainer