-
Notifications
You must be signed in to change notification settings - Fork 189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Design new HDF5 interface #5033
Labels
Comments
Here is a MWE for the hdf5 C++ serial bindings: /* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* Copyright by The HDF Group. *
* All rights reserved. *
* *
* This file is part of HDF5. The full HDF5 copyright notice, including *
* terms governing use, modification, and redistribution, is contained in *
* the LICENSE file, which can be found at the root of the source code *
* distribution tree, or in https://www.hdfgroup.org/licenses. *
* If you do not have access to either file, you may request a copy from *
* [email protected]. *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * */
#define BOOST_TEST_MODULE hdf5 test
#define BOOST_TEST_DYN_LINK
#include <boost/test/unit_test.hpp>
#include "H5Cpp.h" // C++ API header file
#include <array>
#include <vector>
BOOST_AUTO_TEST_CASE(attr_basic_write_then_read) {
const int SPACE1_RANK = 3;
const hsize_t SPACE1_DIM1 = 3;
const hsize_t SPACE1_DIM2 = 15;
const hsize_t SPACE1_DIM3 = 13;
const int ATTR1_RANK = 1; // rank of the 1st data stream
const hsize_t ATTR1_DIM1 = 3; // shape of the 1st data stream
std::vector<int> attr_data1 = {512, -234, 98123}; // Test data for 1st attribute
const int ATTR2_RANK = 2; // rank of the 2nd data stream
const hsize_t ATTR2_DIM1 = 3; // shape of the 2nd data stream
const hsize_t ATTR2_DIM2 = 2;
std::vector<std::array<int, ATTR2_DIM2>> attr_data2{
{7614, -416}, {197814, -3}, {-4, 12}}; // Test data for 2nd attribute
/* Write .h5 file */
{
// Create file
H5::FileAccPropList fapl;
fapl.setLibverBounds(H5F_LIBVER_LATEST, H5F_LIBVER_LATEST);
H5::H5File fid1("attr_basic.h5", H5F_ACC_TRUNC, H5::FileCreatPropList::DEFAULT, fapl);
hsize_t dims1[] = {SPACE1_DIM1, SPACE1_DIM2, SPACE1_DIM3};
hsize_t dims2[] = {ATTR1_DIM1};
hsize_t dims3[] = {ATTR2_DIM1, ATTR2_DIM2};
/* Test attribute with dataset */
// Create a dataset with a dataspace for attributes
H5::DataSpace ds_space(SPACE1_RANK, dims1);
H5::DataSet dataset = fid1.createDataSet("Dataset1", H5::PredType::NATIVE_UCHAR, ds_space);
H5::DataSpace att_space(ATTR1_RANK, dims2);
// Create file and dataset attributes, but only write to one attribute
H5::Attribute ds_attr1 = dataset.createAttribute(
"dataset/attributes:units:particles/atoms/mass/value",
H5::PredType::NATIVE_INT, att_space);
H5::Attribute ds_attr2 = dataset.createAttribute(
"dataset/attributes:units:software/release/major",
H5::PredType::NATIVE_INT, att_space);
// Write attribute information and read attribute immediately, without closing
std::vector<int> read_data1(attr_data1.size()); // buffer
ds_attr1.write(H5::PredType::NATIVE_INT, attr_data1.data());
ds_attr1.read(H5::PredType::NATIVE_INT, read_data1.data());
for (hsize_t i = 0; i < attr_data1.size(); ++i)
BOOST_CHECK_EQUAL(attr_data1[i], read_data1[i]);
// Close attributes
ds_attr1.close();
ds_attr2.close();
/* Test attribute with group */
// Create a group with a dataspace for attributes
H5::Group group = fid1.createGroup("/Group1");
H5::DataSpace sid3(ATTR2_RANK, dims3);
// Create an attribute for the group
H5::Attribute gr_attr = group.createAttribute("Attr2", H5::PredType::NATIVE_INT, sid3);
// Check storage size for attribute
hsize_t attr_size = gr_attr.getStorageSize();
BOOST_CHECK_EQUAL(attr_size, (ATTR2_DIM1 * ATTR2_DIM2 * sizeof(int)));
// Write attribute information
gr_attr.write(H5::PredType::NATIVE_INT, attr_data2.data());
// Check storage size for attribute
attr_size = gr_attr.getStorageSize();
BOOST_CHECK_EQUAL(attr_size, (ATTR2_DIM1 * ATTR2_DIM2 * sizeof(int)));
}
/* Read .h5 file */
{
// Open file
H5::FileAccPropList fapl;
fapl.setLibverBounds(H5F_LIBVER_LATEST, H5F_LIBVER_LATEST);
H5::H5File fid1("attr_basic.h5", H5F_ACC_RDWR, fapl);
/* Test attribute with dataset */
// Verify the correct number of attributes
H5::DataSet dataset = fid1.openDataSet("Dataset1");
int num_file_attrs = dataset.getNumAttrs();
BOOST_CHECK_EQUAL(num_file_attrs, 2);
int num_ds_attrs = dataset.getNumAttrs();
BOOST_CHECK_EQUAL(num_ds_attrs, 2);
// Verify attribute information
H5::Attribute ds_attr = dataset.openAttribute("dataset/attributes:units:particles/atoms/mass/value");
std::vector<int> read_data1(attr_data1.size()); // buffer
ds_attr.read(H5::PredType::NATIVE_INT, read_data1.data());
for (hsize_t i = 0; i < ATTR1_DIM1; i++)
BOOST_CHECK_EQUAL(attr_data1[i], read_data1[i]);
/* Test attribute with group */
// Verify the correct number of attributes
H5::Group group = fid1.openGroup("/Group1");
int num_gr_attrs = group.getNumAttrs();
BOOST_CHECK_EQUAL(num_gr_attrs, 1);
// Verify attribute information
H5::Attribute gr_attr = group.openAttribute("Attr2");
std::vector<std::array<int,ATTR2_DIM2>> read_data2(ATTR2_DIM1);
gr_attr.read(H5::PredType::NATIVE_INT, read_data2.data());
for (hsize_t i = 0; i < ATTR2_DIM1; i++)
for (hsize_t j = 0; j < ATTR2_DIM2; j++)
BOOST_CHECK_EQUAL(attr_data2[i][j], read_data2[i][j]);
}
} Compilation commands: git clone -b develop https://github.com/HDFGroup/hdf5
cd hdf5
python3 -m venv venv
. venv/bin/activate
python3 -m pip install 'cmake==3.28' 'ninja==1.11'
mkdir build
cd build
CC=mpicc CXX=mpicxx cmake \
-C ../config/cmake/cacheinit.cmake \
-G Ninja \
-DCMAKE_BUILD_TYPE=Release \
-DHDF5_ENABLE_PARALLEL:BOOL=ON \
-DHDF5_BUILD_FORTRAN:BOOL=OFF \
-DHDF5_BUILD_JAVA:BOOL=OFF \
-DHDF5_BUILD_CPP_LIB:BOOL=ON \
-DHDF5_ALLOW_UNSUPPORTED:BOOL=ON \
-DHDF5_PACK_EXAMPLES:BOOL=ON \
-DHDF5_EXPORTED_TARGETS:STRING="hdf5-targets" \
-DMPIEXEC_MAX_NUMPROCS:STRING="2" ..
cmake --build .
./bin/cpp_testhdf5
# here copy-paste the MWE into a file
vi mwe.cpp
mpicxx mwe.cpp -Wall -Wextra -I../src -I../src/H5FDsubfiling/ -I./src -I../c++/src \
-L./bin -lhdf5 -lhdf5_cpp -lboost_unit_test_framework -Wl,-rpath,./bin
./a.out Reading the file in Python: import h5py
with h5py.File("attr_basic.h5", "r") as cur:
print(f"{cur.keys()}")
print(f"{cur['Dataset1']=}")
print(f"cur['Dataset1'].attrs={dict(cur['Dataset1'].attrs.items())}")
print(f"cur['Group1'].attrs={dict(cur['Group1'].attrs.items())}") Output:
Reading the file in the terminal: h5dump attr_basic.h5 Output:
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
TL;DR: ESPResSo needs to rewrite its hdf5 bridge to replace the aging h5xx library.
Background
Here are some of the main hdf5 C++ APIs available:
Requirements:
The hdf5 project is actively maintained. They are planning release 2.0.0 for end of 2025. It will use CMake, which is a requirement for integration in ESPResSo since these bindings are not header-only. Its C++ bindings currently do not support parallel I/O, which is a hard requirement for ESPResSo.
Problem statement
ESPResSo needs to interface with efficient parallel I/O libraries to read and write data in a portable file format, such as HDF51 and GSD2 for particle data and VTK for lattice data. File formats for particle data need to support variable particle numbers for Monte Carlo, and metadata specifications (e.g. H5md3) for applications that need SI units (e.g. VOTCA4). The HDF5 file format fulfills these criteria and is already supported by ESPResSo, however its scaling is much poorer than MPI-IO in the current implementation, and it relies on a header-only C++ interface that cannot be compiled in HPC due to known portability bugs.
Roadmap
Investigate suitable hdf5 C++ API replacements for h5xx. When none is found, consider using the hdf5 C API directly. This has the advantage that we don't need CMake integration, don't need an extra library, and header files are already packaged on HPC (both EESSI and Spack) and on most Linux distributions.
The new bridge could be designed like this:
File
class that can write basic particle properties to a file, like masses and positionsespresso_hdf5
target with the new oneThis project is part of MultiXscale Task 3.4 from Deliverable 3.3 due M30 (June 2025).
Footnotes
HDF Group, API Reference, Introduction to HDF5. ↩
Glotzer Group , GSD documentation, HOOMD Schema. ↩
de Buyl et al. 2014, H5MD: A structured, efficient, and portable file format for molecular data, Computer Physics Communications 185(6):1546. ↩
Mashayak et al. 2015, Relative Entropy and Optimization-Driven Coarse-Graining Methods in VOTCA, PLoS ONE 10(7): e0131754. ↩
The text was updated successfully, but these errors were encountered: