Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding support for multiple offload backends #73

Open
awnawab opened this issue Jan 21, 2025 · 2 comments
Open

Adding support for multiple offload backends #73

awnawab opened this issue Jan 21, 2025 · 2 comments
Labels
enhancement New feature or request

Comments

@awnawab
Copy link
Collaborator

awnawab commented Jan 21, 2025

Currently, FIELD_API only supports GPU offload via Nvidia's flavour of OpenACC as well as using the cuda runtime API in certain places. If we are to meaningfully expand the support for other offload backends (e.g. OpenMP or even Cray OpenACC) continuing in the current manner (i.e. ifdefs) is unsustainable.

I propose the following two potential solutions to expand offload backend support:

1. Macros

Right now, we directly write openacc pragmas and openacc/cuda runtime functions in the fypp files. Instead, these should be replaced by macros.

For example, the instruction to copy a contiguous chunk of memory to device currently is:

CALL ACC_MEMCPY_TO_DEVICE (DEVPTR , HST (${ar}$), ISIZE)

It should instead become:

$:COPY_TO_DEVICE_1D(DEVPTR , HST (${ar}$), ISIZE)

Each offload backend would then have an appropriate implementation for COPY_TO_DEVICE_1D. The question then arises as to how the various backend implementations should be defined. The simplest approach that would lead to the least amount of code repetition would be to implement the backends as python modules. Using python modules will enable the use of polymorphism wherever appropriate, e.g., class NvidiaOpenaccCuda would be an extension of class NvidiaOpenacc.

2. Replicating the necessary files

It is only three files that contain GPU offload related instructions:

  • field_RANKSUFF_data_module.fypp
  • dev_alloc_module.fypp
  • host_alloc_module.fypp

We could simply create a copy of each of these files for each backend we are interested in.

Whilst the code may be slightly more readable with solution 2, it will definitely lead to a lot more code replication. Primarily for this reason, I am leaning strongly towards solution 1.

As this would be a big change to FIELD_API, I would really love everyone's input on the above proposal @dareg @pmarguinaud @mlange05 @wertysas.

NB: Both the current issue and issue #72 would benefit greatly from a more logical directory structure of FIELD_API rather than the flat one we currently have. So whilst we discuss the above I will file a PR to that end.

@awnawab awnawab added the enhancement New feature or request label Jan 21, 2025
@awnawab
Copy link
Collaborator Author

awnawab commented Jan 23, 2025

I've since realised that whilst in the "core" library only three files contain offload related instructions, a lot of the "utilities" also contain offload instructions. So there are a lot more than 3 files that contain offload instructions. This further strengthens the argument for using macros, as suggested in solution 1.

@pmarguinaud
Copy link
Collaborator

pmarguinaud commented Jan 30, 2025

Hello Ahmad,

Sorry for the delay in answering your question, but I have been thinking about it anyway.

No simpler solution came to me anyway, so what you suggest is the way to go; it will certainly make the code more complex and harder to understand, but we have no other choice.

Do not hesitate to duplicate some of the files if you think it makes things easier.

Another simple thing I just thought about, would be to name methods of NVIDIA or AMD classes using names similar to what we have in OpenACC; idem for arguments. This would allow for something close to OpenACC namings with which most of us are familiar.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants