-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nonbonded kernel #194
base: master
Are you sure you want to change the base?
Nonbonded kernel #194
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work. I have left most comments inline. I can have a look at the performance too. Also:
- Does it work for 2D systems? You can extract
D
as a type parameter from theSVector
s and use things likezero(SVector{D, T})
rather thanSVector{3, T}(zero(T), zero(T), zero(T))
to make it generic in this sense. You could add a GPU test to the 2D test at the top of test/simulation.jl. - Remove the empty file with a strange name.
- You will need to rebase/merge again from master due to recent changes, sorry about that.
|
||
coords = place_atoms(n_atoms, boundary; min_dist=0.1u"nm") | ||
|
||
if gpu |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be worth having nl
as a Bool
argument to the function and testing both with and without the neighbour list.
@@ -221,8 +221,75 @@ end | |||
) | |||
random_velocities!(s, temp) | |||
|
|||
s_gpu = System( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These new tests run similar things, so it might just be worth keeping one for speed.
Nonbonded force calculation using GPU block-based algorithm
This PR introduces a GPU-optimized algorithm for the calculation of nonbonded forces. The algorithm subdivides the system into blocks of 32 particles, evaluates distances between blocks, and computes forces only for interacting blocks within a cutoff radius. The algorithm is incorporated into OpenMM and it is described in a paper by Eastman and Pande (https://onlinelibrary.wiley.com/doi/full/10.1002/jcc.21413).
This new implementation significantly accelerates nonbonded force calculations for large systems as its performance scales linearly with the number of atoms.
Implementation details
w
slightly smaller than the cutoff radius. The sequence obtained is used in the functionskernel_min_max!
,force_kernel!
andenergy_kernel
.kernel_min_max!
: computes the arrays containing minima and maxima of the bounding boxesforce_kernel!
: computes the nonbonded forces between particles within interacting bounding boxesenergy_kernel
: computes the nonbonded potential energyLet
i
andj
be the indices of the bounding boxes and letn_blocks
be the allocated number of blocks of GPU threads, which matches exactly the number of bounding boxes. Each couple of bounding boxes forms a tile and the 32 threads allocated in each block compute the interactions within each tile. The structure offorce_kernel!
andenergy_kernel
is divided in 4 main parts:j < n_blocks && i < j
: computes the forces within 32x32 tiles by shuffling the coordinates, velocities indices of the particles andAtom
objectsj == n_blocks && i < n_blocks
: computes the forces between ther=N%32
particles in the last box and the particles in the other boxes of 32.i == j && i < n_blocks
: computes interaction within boxes of 32i == n_blocks && j == n_blocks
: computes interactions within the last box ofr
particles.In cases 1 and 2, for each tile that has been identified as interacting, the distance of each atom in the first bounding box is computed with respect to the second bounding box and if not within the cutoff a flag is set for that atoms to skip the computation.
Observations and new neighbor finder (GPUNeighborFinder)
n_steps_reorder
ofGPUNeighborFinder
.force_kernel!
andenergy_kernel
make explicit use of the matriceseligible
andspecial
defined inGPUNeighborFinder
GPUNeighborFinder
: new type of neighbor finder for systems on GPU. ForGPUNeighborFinder
systems, the functionfind_neighbors
returnsnothing
as a neighbor list. This because the new calculation does not require previous knowledge of the neighbor list. The new default choices for the setup of the system are displayed below (lines 890-914 ofsetup.jl
):A new struct is introduced to store the forces matrix, the arrays of minima and maxima together with the sequence of atom indices ordered according to the Morton code.
The function
forces_nounits!
takes now aForcesBuffer
object as an input. While the forces matrix is updated at each step of dynamics, the minima and maxima arrays and the Morton sequence are only updated everyn_steps_reorder
steps.Some tests for this GPU implementation are added in the
test/simulation.jl
andtest/protein.jl
.Performance evaluation
Benchmark of
forces(sys)
performed on6mrr_equil.pdb
(protein system with 15954 atoms) for the proposed implementation and the previous implementation in Molly:Float32
Float64
The old implementation still outperforms the new one in real systems simulations where the neighbor list is not computed at every step. However, I believe that small fixes can adjust the behaviour of the kernel in long simulations as the benchmark with the reordering seems encouraging.
The command
potential_energy(sys)
always performs the reordering step in the new implementation since the potential energy will be computed much less frequently than the forces during a simulation.Float32
Float64
A possible setup for a benchmark with
Float64
precision: