Replaced OpenACC with OpenMP. Updated all build script examples accor…

…dingly. Please update your build script!
predsci · Aug 26, 2024 · f8a43cf · f8a43cf
1 parent 0abd6b0
commit f8a43cf
Show file tree

Hide file tree

Showing 9 changed files with 104 additions and 92 deletions.
diff --git a/README.md b/README.md
@@ -1,17 +1,16 @@
 ![POT3D](pot3d_logo.png)
 
-# POT3D: High Performance Potential Field Solver #
-Predictive Science Inc.  
-www.predsci.com
+# POT3D: High Performance Potential Field Solver 
+[Predictive Science Inc.](https://www.predsci.com)  
 
-## OVERVIEW ##
+## OVERVIEW  
 
-`POT3D` is a Fortran code that computes potential field solutions to approximate the solar coronal magnetic field using observed photospheric magnetic fields as a boundary condition.  It can be used to generate potential field source surface (PFSS), potential field current sheet (PFCS), and open field (OF) models. It has been (and continues to be) used for numerous studies of coronal structure and dynamics.  The code is highly parallelized using [MPI](https://www.mpi-forum.org) and is GPU-accelerated using Fortran standard parallelism (do concurrent) and [OpenACC](https://www.openacc.org/), along with an option to use the [NVIDIA cuSparse library](https://developer.nvidia.com/cusparse). The [HDF5](https://www.hdfgroup.org/solutions/hdf5) file format is used for input/output.
+`POT3D` is a Fortran code that computes potential field solutions to approximate the solar coronal magnetic field using observed photospheric magnetic fields as a boundary condition.  It can be used to generate potential field source surface (PFSS), potential field current sheet (PFCS), and open field (OF) models. It has been (and continues to be) used for numerous studies of coronal structure and dynamics.  The code is highly parallelized using [MPI](https://www.mpi-forum.org) and is GPU-accelerated using Fortran standard parallelism (do concurrent) and [OpenMP Target](https://www.openmp.org/) for data movement and device selection, along with an option to use the [NVIDIA cuSparse library](https://developer.nvidia.com/cusparse). The [HDF5](https://www.hdfgroup.org/solutions/hdf5) file format is used for input/output.
 
-`POT3D` is the potential field solver for the WSA model in the CORHEL software suite publicly hosted at the [Community Coordinated Modeling Center (CCMC)](https://ccmc.gsfc.nasa.gov/models/modelinfo.php?model=CORHEL/MAS/WSA/ENLIL).  
-A version of `POT3D` that includes GPU-acceleration with both MPI+OpenACC and MPI+[OpenMP](https://www.openmp.org//) was released as part of the Standard Performance Evaluation Corporation's (SPEC) beta version of the [SPEChpc(TM) 2021 benchmark suites](https://www.spec.org/hpc2021).  
+`POT3D` is the potential field solver for the WSA/DCHB model in the CORHEL software suite publicly hosted at the [Community Coordinated Modeling Center (CCMC)](https://ccmc.gsfc.nasa.gov/models/modelinfo.php?model=CORHEL/MAS/WSA/ENLIL).  
+A version of `POT3D` that includes GPU-acceleration with both MPI+[OpenACC](https://www.openacc.org) and MPI+OpenMP was released as part of the Standard Performance Evaluation Corporation's (SPEC) beta version of the [SPEChpc(TM) 2021 benchmark suites](https://www.spec.org/hpc2021).  
 
-Details of the `POT3D` code can be found in the following publications:  
+Details of the `POT3D` code can be found in these publications:  
 
  - *Variations in Finite Difference Potential Fields*.  
  Caplan, R.M., Downs, C., Linker, J.A., and Mikic, Z.  [Ap.J. 915,1 44 (2021)](https://iopscience.iop.org/article/10.3847/1538-4357/abfd2f)
@@ -20,15 +19,15 @@ Details of the `POT3D` code can be found in the following publications:
 
 --------------------------------
 
-## HOW TO BUILD POT3D ##
+## HOW TO BUILD POT3D 
 
 Copy a build script from the `build_examples` folder that is closest to your setup to the base directory.  
 Modify the script to set the `HDF5` library paths/flags and compiler flags compatible with your system environment.  
 Then, run the script to build POT3D (for example, `./my_build.sh`).
 
 See the multiple build example scripts in the `build_examples` folder for more details.
 
-### Validate Installation ###
+### Validate Installation 
 
 After building the code, you can test it is working by running `./validate.sh`.  
 This will perform 2 runs of a small case using 1 and 2 MPI ranks respectively.
@@ -41,13 +40,13 @@ Note that these validation runs use `ifprec=1` even if POT3D was build with cuSp
 
 --------------------------------
 
-## HOW TO USE POT3D ##
+## HOW TO USE POT3D  
 
 ### Setting Input Options
 
 POT3D uses a namelist in an input text file called `pot3d.dat` to set all parameters of a run.  See the provided `pot3d_input_documentation.txt` file for details on the various parameter options.  For any run, an input 2D data set in HDF5 format is required for the lower radial magnetic field (`Br`) boundary condition.  Examples of this file are contained in the `examples` and `testsuite` folders.
 
-### Launching the Code ###
+### Launching the Code 
 
 To run `POT3D`, set the desired run parameters in a `pot3d.dat` text file, then copy or link the `pot3d` executable into the same directory as `pot3d.dat`
 and run the command:  
@@ -59,7 +58,7 @@ For example:  `mpiexec -np 1024 ./pot3d`
 For CPU runs, set `ifprec=2` in the `pot3d.dat` input file.  
 For GPU runs, set `ifprec=1` in the `pot3d.dat` input file, unless you build with the `cuSparse` library option, in which case you should set `ifprec=2`.
 
-### Running POT3D on GPUs ###
+### Running POT3D on GPUs 
 
 For standard cases, one should launch the code such that the number of MPI ranks per node is equal to the number of GPUs per node  
 e.g.  
@@ -70,7 +69,7 @@ or
 If the `cuSparse` library option was used to build the code, than set `ifprec=2` in `pot3d.dat`.  
 If the `cuSparse` library option was NOT used to build the code, it is critical to set `ifprec=1` for efficient performance.
 
-### Memory Requirements ###
+### Memory Requirements 
 
 To estimate how much memory (RAM) is needed for a run, compute:  
 
@@ -79,22 +78,22 @@ To estimate how much memory (RAM) is needed for a run, compute:
 where `nr`, `nt`, and `np` are the chosen problem sizes in the `r`, `theta`, and `phi` dimension.  
 Note that this estimate is when using `ifprec=1`.  If using `ifprec=2`, the required memory is ~2x higher on the CPU, and even higher when using `cuSparse` on the GPU.
 
-### Solution Output ###
+### Solution Output 
 
 Depending on the input parameters, `POT3D` can have various outputs. Typically, the three components of the potential magnetic field is output as `HDF5` files. In every run, the following two text files are output:
 
  - `pot3d.out`      An output log showing grid information and magnetic energy diagnostics.
  - `timing.out`     Time profile information of the run.
 
-### Helpful Scripts ###
+### Helpful Scripts 
 
 Some useful python scripts for reading and plotting the POT3D input data, and reading the output data can be found in the  `scripts` folder.  
 
 -----------------------------
 
-## EXAMPLES and TESTSUITE ##
+## EXAMPLES and TESTSUITE 
 
-### Examples ###
+### Examples 
 
 In the `examples` folder, we provide ready-to-run examples of three use cases of `POT3D` in the following folders:
 
@@ -105,7 +104,7 @@ A standard PFCS run using the outer boundary of the PFSS example as its inner bo
 3. **`/open_field`**  
 An example of computing the "open field" model from the solar surface out to 30 Rsun using the same input surface Br as the PFSS example. The magnetic field solution produced is unsigned.  
 
-### Testsuite ###
+### Testsuite 
 
 In the `testsuite` folder, we provide test cases of various sizes that can be used to validate and test the performance of `POT3D`.  
 Each test case contains an `input` folder with the run input files, a `run` folder used to run the test, and a `validation` folder containing the output diagnotics used to validate the test, as well as a text file named `validation_run_information.txt`  containing information on how the validation run was computed (system, compiler, number of ranks, etc.) with performance details.  Note that all tests are set to use `ifprec=1` only.  An option to use `ifprec=2` will be added later.

diff --git a/build_examples/build_cpu_mpi+multithread_intel_ubuntu20.04.sh b/build_examples/build_cpu_mpi+multithread_intel_ubuntu20.04.sh
@@ -27,8 +27,8 @@ FC=mpif90
 # the SAME COMPILER used here, and is in the run-time environment.
 #################################################################
 
-HDF5_INCLUDE_DIR="/opt/psi/intel/ext_deps/deps/hdf5/include"
-HDF5_LIB_DIR="/opt/psi/intel/ext_deps/deps/hdf5/lib"
+HDF5_INCLUDE_DIR="<PATH_TO_HDF5>/include"
+HDF5_LIB_DIR="<PATH_TO_HDF5>/lib"
 
 ##################################################################
 # Please set the HDF5 linker flags to match the installed version.
@@ -40,7 +40,7 @@ HDF5_LIB_FLAGS="-lhdf5_fortran -lhdf5hl_fortran -lhdf5 -lhdf5_hl"
 # Please set the compile flags based on your compiler and hardware setup.
 ###########################################################################
 
-FFLAGS="-O3 -xHost -assume byterecl -heap-arrays -mp"
+FFLAGS="-O3 -xHost -assume byterecl -heap-arrays -qopenmp -fopenmp-target-do-concurrent -fopenmp-targets=spir64_x86_64"
 
 ###########################################################################
 # If using NV HPC SDK for GPUs, with CUDA version >= 11.3, you can set 

diff --git a/build_examples/build_cpu_mpi+multithread_nvidia_ubuntu20.04.sh b/build_examples/build_cpu_mpi+multithread_nvidia_ubuntu20.04.sh
@@ -27,8 +27,8 @@ FC=mpif90
 # the SAME COMPILER used here, and is in the run-time environment.
 #################################################################
 
-HDF5_INCLUDE_DIR="/opt/psi/nv/ext_deps/deps/hdf5/include"
-HDF5_LIB_DIR="/opt/psi/nv/ext_deps/deps/hdf5/lib"
+HDF5_INCLUDE_DIR="<PATH_TO_HDF5>/include"
+HDF5_LIB_DIR="<PATH_TO_HDF5>/lib"
 
 ##################################################################
 # Please set the HDF5 linker flags to match the installed version.
@@ -40,7 +40,7 @@ HDF5_LIB_FLAGS="-lhdf5_fortran -lhdf5hl_fortran -lhdf5 -lhdf5_hl"
 # Please set the compile flags based on your compiler and hardware setup.
 ###########################################################################
 
-FFLAGS="-O3 -march=native -stdpar=multicore -acc=multicore"
+FFLAGS="-O3 -march=native -stdpar=multicore -mp=multicore"
 
 ###########################################################################
 # If using NV HPC SDK for GPUs, with CUDA version >= 11.3, you can set 

diff --git a/build_examples/build_cpu_mpi-only_intel_ubuntu20.04.sh b/build_examples/build_cpu_mpi-only_intel_ubuntu20.04.sh
@@ -27,8 +27,8 @@ FC=mpif90
 # the SAME COMPILER used here, and is in the run-time environment.
 #################################################################
 
-HDF5_INCLUDE_DIR="/opt/psi/intel/ext_deps/deps/hdf5/include"
-HDF5_LIB_DIR="/opt/psi/intel/ext_deps/deps/hdf5/lib"
+HDF5_INCLUDE_DIR="<PATH_TO_HDF5>/include"
+HDF5_LIB_DIR="<PATH_TO_HDF5>/lib"
 
 ##################################################################
 # Please set the HDF5 linker flags to match the installed version.

diff --git a/build_examples/build_cpu_mpi-only_nvidia_ubuntu20.04.sh b/build_examples/build_cpu_mpi-only_nvidia_ubuntu20.04.sh
@@ -27,8 +27,8 @@ FC=mpif90
 # the SAME COMPILER used here, and is in the run-time environment.
 #################################################################
 
-HDF5_INCLUDE_DIR="/opt/psi/nv/ext_deps/deps/hdf5/include"
-HDF5_LIB_DIR="/opt/psi/nv/ext_deps/deps/hdf5/lib"
+HDF5_INCLUDE_DIR="<PATH_TO_HDF5>/include"
+HDF5_LIB_DIR="<PATH_TO_HDF5>/lib"
 
 ##################################################################
 # Please set the HDF5 linker flags to match the installed version.

diff --git a/build_examples/build_gpu_nvidia_ubuntu20.04.sh b/build_examples/build_gpu_nvidia_ubuntu20.04.sh
@@ -27,8 +27,8 @@ FC=mpif90
 # the SAME COMPILER used here, and is in the run-time environment.
 #################################################################
 
-HDF5_INCLUDE_DIR="/opt/psi/nv/ext_deps/deps/hdf5/include"
-HDF5_LIB_DIR="/opt/psi/nv/ext_deps/deps/hdf5/lib"
+HDF5_INCLUDE_DIR="<PATH>/hdf5/include"
+HDF5_LIB_DIR="<PATH>/hdf5/lib"
 
 ##################################################################
 # Please set the HDF5 linker flags to match the installed version.
@@ -40,7 +40,7 @@ HDF5_LIB_FLAGS="-lhdf5_fortran -lhdf5hl_fortran -lhdf5 -lhdf5_hl"
 # Please set the compile flags based on your compiler and hardware setup.
 ###########################################################################
 
-FFLAGS="-O3 -march=native -stdpar=gpu -acc=gpu -gpu=nomanaged,nounified -Minfo=accel"
+FFLAGS="-O3 -march=native -stdpar=gpu -acc=gpu -mp=gpu -gpu=ccnative,mem:separate -Minfo=accel"
 
 ###########################################################################
 # If using NV HPC SDK for GPUs, with CUDA version >= 11.3, you can set 
@@ -51,7 +51,7 @@ FFLAGS="-O3 -march=native -stdpar=gpu -acc=gpu -gpu=nomanaged,nounified -Minfo=a
 ###########################################################################
 
 POT3D_CUSPARSE=1
-CCFLAGS="-O3 -march=native -acc=gpu -gpu=nomanaged,nounified -Minfo=accel"
+CCFLAGS="-O3 -march=native -mp=gpu -gpu=ccnative,mem:separate -Minfo=accel"
 
 ###########################################################################
 ###########################################################################

diff --git a/src/lusol_cusparse.c b/src/lusol_cusparse.c
@@ -314,7 +314,7 @@ void lusol_cusparse(double* restrict x)
   //
   ////////////////////////////////////////////////////////////////////////////////////
 
-#pragma acc parallel loop deviceptr(x,x_32)
+#pragma omp target teams distribute parallel for is_device_ptr(x,x_32)
   for (int i=0;i<N_global;i++){
     x_32[i] = (float) x[i];
   }
@@ -359,7 +359,7 @@ void lusol_cusparse(double* restrict x)
   //
   ////////////////////////////////////////////////////////////////////////////////////
 
-#pragma acc parallel loop deviceptr(x,x_32)
+#pragma omp target teams distribute parallel for is_device_ptr(x,x_32)
   for (int i=0;i<N_global;i++){
     x[i] = (double) x_32[i];
   }