You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, FIELD_API only supports GPU offload via Nvidia's flavour of OpenACC as well as using the cuda runtime API in certain places. If we are to meaningfully expand the support for other offload backends (e.g. OpenMP or even Cray OpenACC) continuing in the current manner (i.e. ifdefs) is unsustainable.
I propose the following two potential solutions to expand offload backend support:
1. Macros
Right now, we directly write openacc pragmas and openacc/cuda runtime functions in the fypp files. Instead, these should be replaced by macros.
For example, the instruction to copy a contiguous chunk of memory to device currently is:
Each offload backend would then have an appropriate implementation for COPY_TO_DEVICE_1D. The question then arises as to how the various backend implementations should be defined. The simplest approach that would lead to the least amount of code repetition would be to implement the backends as python modules. Using python modules will enable the use of polymorphism wherever appropriate, e.g., class NvidiaOpenaccCuda would be an extension of class NvidiaOpenacc.
2. Replicating the necessary files
It is only three files that contain GPU offload related instructions:
field_RANKSUFF_data_module.fypp
dev_alloc_module.fypp
host_alloc_module.fypp
We could simply create a copy of each of these files for each backend we are interested in.
Whilst the code may be slightly more readable with solution 2, it will definitely lead to a lot more code replication. Primarily for this reason, I am leaning strongly towards solution 1.
NB: Both the current issue and issue #72 would benefit greatly from a more logical directory structure of FIELD_API rather than the flat one we currently have. So whilst we discuss the above I will file a PR to that end.
The text was updated successfully, but these errors were encountered:
I've since realised that whilst in the "core" library only three files contain offload related instructions, a lot of the "utilities" also contain offload instructions. So there are a lot more than 3 files that contain offload instructions. This further strengthens the argument for using macros, as suggested in solution 1.
Sorry for the delay in answering your question, but I have been thinking about it anyway.
No simpler solution came to me anyway, so what you suggest is the way to go; it will certainly make the code more complex and harder to understand, but we have no other choice.
Do not hesitate to duplicate some of the files if you think it makes things easier.
Another simple thing I just thought about, would be to name methods of NVIDIA or AMD classes using names similar to what we have in OpenACC; idem for arguments. This would allow for something close to OpenACC namings with which most of us are familiar.
Currently, FIELD_API only supports GPU offload via Nvidia's flavour of OpenACC as well as using the cuda runtime API in certain places. If we are to meaningfully expand the support for other offload backends (e.g. OpenMP or even Cray OpenACC) continuing in the current manner (i.e. ifdefs) is unsustainable.
I propose the following two potential solutions to expand offload backend support:
1. Macros
Right now, we directly write openacc pragmas and openacc/cuda runtime functions in the fypp files. Instead, these should be replaced by macros.
For example, the instruction to copy a contiguous chunk of memory to device currently is:
It should instead become:
Each offload backend would then have an appropriate implementation for
COPY_TO_DEVICE_1D
. The question then arises as to how the various backend implementations should be defined. The simplest approach that would lead to the least amount of code repetition would be to implement the backends as python modules. Using python modules will enable the use of polymorphism wherever appropriate, e.g.,class NvidiaOpenaccCuda
would be an extension ofclass NvidiaOpenacc
.2. Replicating the necessary files
It is only three files that contain GPU offload related instructions:
We could simply create a copy of each of these files for each backend we are interested in.
Whilst the code may be slightly more readable with solution 2, it will definitely lead to a lot more code replication. Primarily for this reason, I am leaning strongly towards solution 1.
As this would be a big change to FIELD_API, I would really love everyone's input on the above proposal @dareg @pmarguinaud @mlange05 @wertysas.
NB: Both the current issue and issue #72 would benefit greatly from a more logical directory structure of FIELD_API rather than the flat one we currently have. So whilst we discuss the above I will file a PR to that end.
The text was updated successfully, but these errors were encountered: