Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add compiler output to query_config #4743

Open
rljacob opened this issue Jan 31, 2025 · 6 comments
Open

Add compiler output to query_config #4743

rljacob opened this issue Jan 31, 2025 · 6 comments

Comments

@rljacob
Copy link
Member

rljacob commented Jan 31, 2025

Currently, query_config --machine gives this output:

[jacob@chrlogin2 scripts]$ ./query_config --machines frontier
Machine(s)

  frontier : Frontier exascale supercomputer at ORNL. 9408 nodes, Node: 4 AMD MI250X GPUs (2 GCDs) ~ 8 GPUs, 512 GB HDB2E, AMD EPYC 64 cores, 512GB DDR4
      os              Linux
      compilers       gnu,amdclang,crayclang,gnugpu,amdclanggpu,crayclanggpu
      mpilibs         ['mpich']
      pes/node        56
      max_tasks/node  56
      max_gpus/node  0

We would like to add an ability to git more info on the compiler. something like
./query_config --machines frontier --compiler gnugpu

The info to print out would be the info contained in these xml blocks (for the case of "gnugpu" on E3SM's "frontier" machine):

      <modules compiler="gnu.*">
        <command name="reset"></command>
        <command name="switch">Core Core/24.00</command>
        <command name="switch">PrgEnv-cray PrgEnv-gnu/8.3.3</command>
        <command name="switch">gcc gcc/12.2.0</command>
      </modules>
      <modules compiler="gnugpu">
        <command name="load">craype-accel-amd-gfx90a</command>
        <command name="load">rocm/5.4.0</command>
      </modules>
    <environment_variables compiler=".*gpu.*">
      <env name="NTASKS_PER_GPU">--ntasks-per-gpu=$SHELL{echo "`./xmlquery --value MAX_MPITASKS_PER_NODE`/8"|bc}</env>
      <env name="GPU_BIND_ARGS">--gpu-bind=closest</env>
      <env name="PNETCDF_HINTS">romio_cb_read=disable</env>
      <env name="MPICH_GPU_SUPPORT_ENABLED">0</env>
      <env name="MPICH_CXX">$SHELL{which hipcc}</env>
    </environment_variables>
    <environment_variables compiler=".*gpu.*" DEBUG="TRUE">
      <env name="AMD_LOG_LEVEL">10</env>
      <env name="CRAY_ACC_DEBUG">3</env>
    </environment_variables>
    <environment_variables compiler="gnu.*" mpilib="mpich">
      <env name="ADIOS2_ROOT">$SHELL{if [ -z "$ADIOS2_ROOT" ]; then echo /lustre/orion/cli115/world-shared/frontier/3rdparty/adios2/2.9.1/cray-mpich-8.1.23/gcc-11.2.0; else echo "$ADIOS2_ROOT"; fi}</env>
    </environment_variables>

The printout might look like:

  frontier : compiler: gnugpu
Module commands:
  reset
  switch Core Core/24.00
  switch PrgEnv-cray PrgEnv-gnu/8.3.3
  switch gcc gcc/12.2.0
  load craype-accel-amd-gfx90a
  load rocm/5.4.0
Environment variables:
  NTASKS_PER_GPU:  -ntasks-per-gpu=$SHELL{echo "`./xmlquery --value MAX_MPITASKS_PER_NODE`/8"|bc}
  GPU_BIND_ARGS: --gpu-bind=closest
  PNETCDF_HINTS: romio_cb_read=disable
  MPICH_GPU_SUPPORT_ENABLED:
  MPICH_CXX:  $SHELL{which hipcc}
  (with DEBUG="TRUE") : 
      AMD_LOG_LEVEL: 10
      CRAY_ACC_DEBUG: 3
  (with mpilib="mpich"):
     ADIOS2_ROOT:  $SHELL{if [ -z "$ADIOS2_ROOT" ]; then echo /lustre/orion/cli115/world-shared/frontier/3rdparty/adios2/2.9.1/cray-mpich-8.1.23/gcc-11.2.0; else echo "$ADIOS2_ROOT"; fi}

The reason is that we are having a hard time getting all the relevant info in to the compiler name. See E3SM-Project/E3SM#6773 for background.

@rljacob
Copy link
Member Author

rljacob commented Jan 31, 2025

@bartgol what do you think of the above?

@rljacob
Copy link
Member Author

rljacob commented Jan 31, 2025

For a setting like NTASKS_PER_GPU, instead of literally printing out "$SHELL{echo "./xmlquery --value MAX_MPITASKS_PER_NODE/8"|bc}", print the resulting value of that shell command.

@rljacob
Copy link
Member Author

rljacob commented Jan 31, 2025

The "--compiler" argument must be accompanied by a "--machine" argument. Its an error to just use "--compiler".

@bartgol
Copy link
Contributor

bartgol commented Jan 31, 2025

So you would print the content of the modules and environment_variables sections that match the compiler? Seems reasonable, but there may be a few details to clarify:

  • what do we do with sections that also have other filters, such as "mpilib" or "BUILD_THREADED"? One option is to print them all (but highlight that they depend on inner options). Not printing them altogether may be too restrictive. Another option is to allow the user to specify additional filters, such as --filters mpilib=mpich BUILD_THREADED=false. It should not be hard to parse these.
  • so doing ./query_config --machine frontier will print LESS stuff than ./query_config --machine frontier --compiler crayclang? This is somewhat the opposite of what one could expect, since less filters should give "more matches". However, it makes sense to not print lots of configs without a compiler. That's why, if the compiler is not specified, I would print a line like
Modules commands:
     UNKNOWN
     requires '--compiler <compiler_name>' option
Environment variables:
    UNKNOWN
    requires '--compiler <compiler_name>' option

@rljacob
Copy link
Member Author

rljacob commented Feb 7, 2025

My example above shows how the DEBUG=TRUE options, for example, would be indicated in the output.

Its a command line argument, not a filter. So more arguments equals more detail. I don't think that's counter-intuitive.

@bartgol
Copy link
Contributor

bartgol commented Feb 10, 2025

Ah, I somehow missed the last few lines. So the options that depend on further filters are printed along with what option is needed to get them. Yes, I think that's a good format.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants