Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

327 detect architecture features automatically #328

Open
wants to merge 8 commits into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 18 additions & 17 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -126,14 +126,10 @@ grbrun ./a.out

In more detail, the steps to follow are:

1. Edit the `include/graphblas/base/config.hpp`. In particular, please ensure
that `config::SIMD_SIZE::bytes` defined in that file is set correctly with
respect to the target architecture.

2. Create an empty directory for building ALP and move into it:
1. Create an empty directory for building ALP and move into it:
`mkdir build && cd build`.

3. Invoke the `bootstrap.sh` script located inside the ALP root directory
2. Invoke the `bootstrap.sh` script located inside the ALP root directory
`<ALP/root/dir>` to generate the build infrastructure via CMake inside the
the current directory:

Expand All @@ -142,17 +138,17 @@ In more detail, the steps to follow are:
- note: add `--with-lpf=/path/to/lpf/install/dir` if you have LPF installed
and would like to use it.

4. Issue `make -j` to compile the C++11 ALP library for the configured backends.
3. Issue `make -j` to compile the C++11 ALP library for the configured backends.

5. (*Optional*) To later run all unit tests, several datasets must be made
4. (*Optional*) To later run all unit tests, several datasets must be made
available. Please run the `<ALP/root/dir>/tools/downloadDatasets.sh`
script for

a. an overview of datasets required for the basic tests, as well as

b. the option to automatically download them.

6. (*Optional*) To make the ALP documentation, issue `make userdocs`. This
5. (*Optional*) To make the ALP documentation, issue `make userdocs`. This
generates both

a. LaTeX in `<ALP build dir>/docs/user/latex/refman.tex`, and
Expand All @@ -162,7 +158,7 @@ In more detail, the steps to follow are:
To build a PDF from the LaTeX sources, cd into the directory mentioned, and
issue `make`.

7. (*Optional*) Issue `make -j smoketests` to run a quick set of functional
6. (*Optional*) Issue `make -j smoketests` to run a quick set of functional
tests. Please scan the output for any failed tests.
If you do this with LPF enabled, and LPF was configured to use an MPI engine
(which is the default), and the MPI implementation used is _not_ MPICH, then
Expand All @@ -171,15 +167,15 @@ In more detail, the steps to follow are:
implementation you used, and uncomment the lines directly below each
occurrence.

8. (*Optional*) Issue `make -j unittests` to run an exhaustive set of unit
7. (*Optional*) Issue `make -j unittests` to run an exhaustive set of unit
tests. Please scan the output for any failed tests.
If you do this with LPF enabled, please edit `tests/parse_env.sh` if required
as described in step 5.

9. Issue `make -j install` to install ALP into the install directory configured
8. Issue `make -j install` to install ALP into the install directory configured
during step 1.

10. (*Optional*) Issue `source </path/to/install/dir>/bin/setenv` to make
9. (*Optional*) Issue `source </path/to/install/dir>/bin/setenv` to make
available the `grbcxx` and `grbrun` compiler wrapper and runner.

Congratulations, you are now ready for developing and integrating ALP
Expand Down Expand Up @@ -230,6 +226,8 @@ and lists technical papers.
- [Development in ALP](#development-in-alp)
- [Acknowledgements](#acknowledgements)
- [Citing ALP, ALP/GraphBLAS, and ALP/Pregel](#citing-alp-alpgraphblas-and-alppregel)
- [ALP and ALP/GraphBLAS](#alp-and-alpgraphblas)
- [ALP/Pregel](#alppregel)


# Configuration
Expand Down Expand Up @@ -259,6 +257,8 @@ classes of backends. The main configuration file is found in
8. type used for indexing nonzeroes, as the `NonzeroIndexType` typedef;
9. index type used for vector coordinates, as the `VectorIndexType` typedef.

The most important parameters are automatically detected during the CMake
configuration (vector size, cache line size, L1 Data cache size).
Other configuration values in this file are automatically inferred, are fixed
non-configurable settings, or are presently not used by any ALP backend.

Expand Down Expand Up @@ -418,10 +418,11 @@ large outputs is strongly advisable.

### Compilation

Our backends auto-vectorise, hence please recall step 1 from the quick start
guide, and make sure the `include/graphblas/base/config.hpp` file reflects the
correct value for `config::SIMD_SIZE::bytes`. This value must be updated prior
to the compilation and installation of ALP.
Our backends auto-vectorise using the information in the
`include/graphblas/base/config.hpp`, especially `config::SIMD_SIZE::bytes`.
This and other values are automatically detected for the CPU the CMake
configuration runs on; the user may however want to set it to a different value
manually prior to the compilation and installation of ALP.

When targeting different architectures with differing SIMD widths, different
ALP installations for different architectures could be maintained.
Expand Down
3 changes: 3 additions & 0 deletions cmake/AddGRBInstall.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,9 @@ set( ALP_UTILS_LIBRARY_OUTPUT_NAME "alp_utils" )
set( BINARY_LIBRARIES_INSTALL_DIR "${CMAKE_INSTALL_PREFIX}/lib" )
set( CMAKE_CONFIGS_INSTALL_DIR "${CMAKE_INSTALL_PREFIX}/cmake" )
set( NAMESPACE_NAME "ALPGraphBLAS")
set( ARCH_DETECT_APPS_DIR ${CMAKE_CURRENT_BINARY_DIR}/src/arch_info )
set( ARCH_DETECT_APPS_INSTALL_DIR ${BIN_INSTALL_DIR}/arch_info )


# installation export unit for ALL targets
install( EXPORT GraphBLASTargets
Expand Down
12 changes: 12 additions & 0 deletions cmake/CompileFlags.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,10 @@ assert_defined_variables(
TEST_PERFORMANCE_DEFINITIONS TEST_PERFORMANCE_OPTIONS
)

### ARCH INFO DETECTION
include( DetectArchInfo )
assert_valid_variables( SIMD_SIZE L1DCACHE_SIZE CACHE_LINE_SIZE )

# allow only Relase, Debug and Coverage
set( CMAKE_CONFIGURATION_TYPES "Release;Debug;Coverage" CACHE STRING
"Add the configurations that we need" FORCE
Expand All @@ -60,10 +64,18 @@ endif()

set( COMMON_OPTS "-g" "-Wall" "-Wextra" )

set( arch_defs "_SIMD_SIZE=${SIMD_SIZE};_L1DCACHE_SIZE=${L1DCACHE_SIZE};_CACHE_LINE_SIZE=${CACHE_LINE_SIZE}" )

# cache variable to allow manual tweaks from CMake cache
set_valid_string( COMMON_DEFS_Release "${COMMON_COMPILE_DEFINITIONS}" "" )
set_valid_string( COMMON_DEFS_Debug "${COMMON_COMPILE_DEFINITIONS}" "" )
set_valid_string( COMMON_DEFS_Coverage "${COMMON_COMPILE_DEFINITIONS}" "" )

list( PREPEND COMMON_DEFS_Release ${arch_defs} )
list( PREPEND COMMON_DEFS_Debug ${arch_defs} )
list( PREPEND COMMON_DEFS_Coverage ${arch_defs} )


set_valid_string( COMMON_OPTS_Release "${COMMON_COMPILE_OPTIONS}"
"${COMMON_OPTS}"
)
Expand Down
127 changes: 127 additions & 0 deletions cmake/DetectArchInfo.cmake
Original file line number Diff line number Diff line change
@@ -0,0 +1,127 @@
#
# Copyright 2024 Huawei Technologies Co., Ltd.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

#[===================================================================[
Detect Architectural Info for the system CPU

Three parameters are detected and used during compilation:
1. maximum supported size of the SIMD vector
2. size of the L1 Data cache (shortly L1D)
3. size of the L1D cache line (typically the same for all caches)

If any of this information cannot be gathered from hardware, a default is used.
#]===================================================================]

assert_valid_variables( ARCH_DETECT_APPS_DIR )

set( _supported_arches "x86_64;aarch64" )
if( NOT CMAKE_SYSTEM_PROCESSOR IN_LIST _supported_arches )
message( WARNING "Architecture \"${CMAKE_SYSTEM_PROCESSOR}\" not supported" )
else()
set( _supported_arch ON )
endif()

set( DEFAULT_SIMD_SIZE 64 )
set( DEFAULT_L1CACHE_SIZE 32768 )
set( DEFAULT_CACHE_LINE_SIZE 64 )

if( CMAKE_VERSION VERSION_LESS "3.25.0" )
# old CMake versions have a different signature for try_compile()
# https://cmake.org/cmake/help/latest/command/try_run.html#try-compiling-and-running-source-files
set( _dest ${CMAKE_CURRENT_BINARY_DIR} )
endif()

# compile executable to detect SIMD ISA and run it
set( ARCH_DETECT_APPS_DIR ${CMAKE_CURRENT_BINARY_DIR}/src/arch_info )
set( _simd_detect_destination detect_simd_isa )

set( SIMD_ISA_DETECT_APP OFF )
# compile and also copy the file to a known folder in order to use it in the
# installation infrastructure: the grbcxx script needs it
if( _supported_arch )
try_compile( COMPILED ${_dest} SOURCES ${CMAKE_SOURCE_DIR}/cmake/${CMAKE_SYSTEM_PROCESSOR}_simd_detect.c
COPY_FILE ${ARCH_DETECT_APPS_DIR}/${_simd_detect_destination}
COPY_FILE_ERROR COPY_MSG
)
endif()
if( COMPILED )
# attemtp to run the compiled app
execute_process(
COMMAND ${ARCH_DETECT_APPS_DIR}/${_simd_detect_destination}
RESULT_VARIABLE RES
OUTPUT_VARIABLE SIMD_ISA
OUTPUT_STRIP_TRAILING_WHITESPACE
)
endif()

if( NOT COMPILED OR ( NOT RES STREQUAL "0" ) OR COPY_MSG )
# if we could not compile or run, set defaults
set( SIMD_SIZE ${DEFAULT_SIMD_SIZE} )
message( WARNING "Cannot detect SIMD ISA, thus applying default vector size: ${SIMD_SIZE}B" )
else()
# set vector size based on detected SIMD ISA and wanr in case of SVE or SVE2
# not yet implemented
set( SIMD_ISA_DETECT_APP ${_simd_detect_destination} )
if( SIMD_ISA STREQUAL "SVE" OR SIMD_ISA STREQUAL "SVE2" )
set( SIMD_SIZE 64 )
message( WARNING "Detected SIMD ISA ${SIMD_ISA}, whose size is implementation-dependent and currently not detected. Please, consider filing an issue to the authors. Applying default vector size: ${SIMD_SIZE}B" )
else()
if( SIMD_ISA STREQUAL "AVX512" )
set( SIMD_SIZE 64 )
elseif( SIMD_ISA STREQUAL "AVX2" )
set( SIMD_SIZE 32 )
elseif( SIMD_ISA STREQUAL "AVX" )
set( SIMD_SIZE 16 )
elseif( SIMD_ISA STREQUAL "NEON" )
set( SIMD_SIZE 16 )
endif()
message( "Detected SIMD ISA: ${SIMD_ISA}; vector size : ${SIMD_SIZE}B" )
endif()
endif()

set( L1CACHE_DETECT_APP OFF )
# for L1D information, use a Bash script, so just try to run it
execute_process(
COMMAND ${CMAKE_SOURCE_DIR}/cmake/l1_cache_info.sh
RESULT_VARIABLE RES
OUTPUT_VARIABLE CACHE_DETECT_OUTPUT
OUTPUT_STRIP_TRAILING_WHITESPACE
)
# copy the script to the build infrastructure, for testing and for installation
file( COPY ${CMAKE_SOURCE_DIR}/cmake/l1_cache_info.sh DESTINATION ${ARCH_DETECT_APPS_DIR} )
if( NOT RES STREQUAL "0" )
# could not run properly, set defaults
set( L1CACHE_SIZE ${DEFAULT_L1CACHE_SIZE} )
set( CACHE_LINE_SIZE ${DEFAULT_CACHE_LINE_SIZE} )
message( WARNING "Cannot detect L1 cache features, thus applying default settigs" )
else()
set( L1CACHE_DETECT_APP l1_cache_info.sh )
# parse multi-lines output and get each info; example output:
# TYPE: Data
# SIZE: 32768
# LINE: 64
string( REGEX MATCHALL
"TYPE:[ \t]*(Data|Unified)[ \t\r\n]+SIZE:[ \t]*([0-9]+)[ \t\r\n]+LINE:[ \t]*([0-9]+)[ \t\r\n]*"
MATCH_OUTPUT "${CACHE_DETECT_OUTPUT}"
)
set( L1DCACHE_TYPE ${CMAKE_MATCH_1} )
set( L1DCACHE_SIZE ${CMAKE_MATCH_2} )
set( CACHE_LINE_SIZE ${CMAKE_MATCH_3} )
if( L1DCACHE_TYPE STREQUAL "Unified" )
message( WARNING "The L1 cache is Unified, so it may not be possible to effectively utilize its entire size (${L1DCACHE_SIZE}B) for the data." )
endif()
endif()
message( "L1 cache size: ${L1DCACHE_SIZE}B; cacheline size: ${CACHE_LINE_SIZE}B" )
58 changes: 58 additions & 0 deletions cmake/aarch64_simd_detect.c
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@

/*
* Copyright 2024 Huawei Technologies Co., Ltd.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

#include <stdio.h>
#include <sys/auxv.h>

/*
* Check the supported SIMD ISA in an ARM architecture, via getauxval():
* https://man7.org/linux/man-pages/man3/getauxval.3.html
*
* Note that support for SVE2 may be too recent for the kernel/GLIBC version in
* use, hence the #ifdef on HWCAP2_SVE2.
* https://docs.kernel.org/arch/arm64/elf_hwcaps.html
*
* Also note that SVE (and SVE2) has implementation-dependent vector size, whose
* retrieval is currently not implemented; the build infrastructure properly
* warns about this case.
*/

int main() {

#ifdef HWCAP2_SVE2
if( getauxval( AT_HWCAP2 ) & HWCAP2_SVE2 ) {
printf( "SVE2\n" );
return 0;
}
#endif

int retval = 0;
const unsigned long flags = getauxval( AT_HWCAP );
#ifdef HWCAP_SVE
if( flags & HWCAP_SVE ) {
printf("SVE");
} else
#endif
if ( flags & HWCAP_ASIMD ) {
printf( "NEON" );
} else {
printf( "no SIMD ISA detected!" );
retval = 1;
}
printf( "\n" );
return retval;
}
Loading
Loading