Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-GPU support? #125

Open
jchodera opened this issue Oct 14, 2023 · 8 comments
Open

Multi-GPU support? #125

jchodera opened this issue Oct 14, 2023 · 8 comments

Comments

@jchodera
Copy link
Member

How can we best support parallelization of ML potentials across GPUs?

We're dealing with models that are small enough to be replicated on each GPU, and only O(N) data (positions, box vectors) needs to be sent and O(N) data (forces) accumulated. Models like ANI should be trivially parallelizable across atoms.

@peastman
Copy link
Member

OpenMM's infrastructure for parallel execution can in principle be applied to any Force. Internally it creates a separate ComputeContext for each device, and a separate copy of the KernelImpl for each one. All of them get executed in parallel, and any energies and forces they return are summed.

The challenge is figuring out what each of those KernelImpl's should do when it gets invoked. For many Forces this is simple. With most bonded forces, we can just divide up the bonds between GPUs, with each one computing a different subset. NonbondedForce is a bit more complicated, but we have ways of doing it.

What would TorchForce do? It doesn't know anything about the internal structure of the model. It just gets invoked once, taking all coordinates as inputs and producing the total energy as output. So the division of work would have to be done inside the model itself. We could pass in a pair of integers telling it how many devices it was executing on, and the index of the current device. The model would have to decide what to do with those inputs such that each device would do a similar amount of work, and the total energy would add up to the correct amount.

@RaulPPelaez
Copy link
Contributor

Perhaps this would be something for NNPOps. We could provide there drop-in implementations of selected models that would be multi-GPU aware.
This would need to be done on a model-by-model basis.
I will leave this here for reference:
https://pytorch.org/docs/stable/generated/torch.nn.DataParallel.html#torch.nn.DataParallel
https://pytorch.org/docs/stable/multiprocessing.html

@xiaowei-xie2
Copy link

Hi, I was wondering is there a way to run REMD (ReplicaExchangeSampler) with torchForce with multi GPU?

@peastman
Copy link
Member

It should work exactly like any other force. Replica exchange is implemented at a higher level, using multiple Contexts for the replicas. It doesn't care how the forces in each Context are computed.

@xiaowei-xie2
Copy link

Oh nice! Could you provide a simple example for how to do this? I came across this issue choderalab/openmmtools#648, but could not figure out how to do it exactly.

@peastman
Copy link
Member

I suggest asking on the openmmtools repo. The question isn't related to this package.

@xiaowei-xie2
Copy link

Ok, I will do that. Thank you!

@SyntaxSmith
Copy link

Message Passing GNN is still a difficult problem for multi-GPUs MD, we need exchage ghost node's feature between interaction layers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants