Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build fails with kokkos error #62

Open
namehta4 opened this issue Mar 6, 2025 · 0 comments
Open

Build fails with kokkos error #62

namehta4 opened this issue Mar 6, 2025 · 0 comments

Comments

@namehta4
Copy link

namehta4 commented Mar 6, 2025

HI,

I am trying to build LAMMPS Allegro in a container with the following build script:

FROM nvcr.io/nvidia/cuda:11.8.0-cudnn8-devel-ubuntu22.04
WORKDIR /opt
ENV DEBIAN_FRONTEND noninteractive

RUN \
    apt-get update        &&   \   
    apt-get install --yes      \   
        build-essential autoconf cmake flex bison zlib1g-dev \
        fftw-dev fftw3 apbs libicu-dev libbz2-dev libgmp-dev \
        libboost-all-dev bc libblas-dev liblapack-dev git    \   
        libfftw3-dev automake lsb-core libxc-dev libgsl-dev  \
        unzip libhdf5-serial-dev ffmpeg libcurl4-openssl-dev \
        libboost-dev libboost-system-dev libtool swig        \   
        libboost-filesystem-dev libboost-graph-dev uuid-dev  \
        libboost-regex-dev libedit-dev libyaml-cpp-dev make  \
        python3-yaml automake pkg-config libc6-dev libzmq3-dev \
        libjansson-dev liblz4-dev libarchive-dev python3-pip \
        libsqlite3-dev lua5.1 liblua5.1-dev lua-posix jq     \   
        python3-dev python3-cffi python3-ply python3-sphinx  \
        aspell aspell-en valgrind libyaml-cpp-dev wget vim   \   
        libquadmath0 \
        make libzmq3-dev python3-yaml time valgrind  libeigen3-dev \
        mlocate python3-jsonschema python-is-python3       &&\ 
    apt-get clean all 

RUN apt-get update && apt-get install --yes gpg-agent wget
RUN wget -O- https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB | gpg --dearmor | tee /usr/share/keyrings/oneapi-archive-keyring.gpg > /dev/null
RUN echo "deb [signed-by=/usr/share/keyrings/oneapi-archive-keyring.gpg] https://apt.repos.intel.com/oneapi all main" | tee /etc/apt/sources.list.d/oneAPI.list
RUN apt-get update 
RUN apt-get install --yes intel-oneapi-mkl intel-oneapi-mkl-devel && \
    apt-get clean all 


WORKDIR /opt
ARG mpich=4.2.2
ARG mpich_prefix=mpich-$mpich
RUN \
    wget https://www.mpich.org/static/downloads/$mpich/$mpich_prefix.tar.gz && \
    tar xvzf $mpich_prefix.tar.gz                                           && \
    cd $mpich_prefix                                                        && \
    ./configure FFLAGS=-fallow-argument-mismatch FCFLAGS=-fallow-argument-mismatch \
    --prefix=/opt/mpich/install                                             && \
    make -j 16                                                              && \
    make install                                                            && \
    make clean                                                              && \
    cd ..                                                                   && \
    rm -rf $mpich_prefix.tar.gz
ENV PATH=$PATH:/opt/mpich/install/bin
ENV PATH=$PATH:/opt/mpich/install/include
ENV LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/mpich/install/lib
RUN /sbin/ldconfig

ENV PATH=$PATH:/usr/local/cuda/lib64/stubs
ENV LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64/stubs
ENV PATH=$PATH:/usr/local/cuda-11.8/targets/x86_64-linux/lib/stubs
ENV LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-11.8/targets/x86_64-linux/lib/stubs

RUN ln -s /usr/local/cuda-11.8/targets/x86_64-linux/lib/stubs/libnvidia-ml.so /usr/local/cuda-11.8/targets/x86_64-linux/lib/stubs/libnvidia-ml.so.1 

RUN python -m pip install cffi numpy meson ninja
RUN python -m pip install setuptools
RUN python -m pip install mpi4py -i https://pypi.anaconda.org/mpi4py/simple
RUN python -m pip install numpy scipy matplotlib
RUN python -m pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
RUN python -m pip install clang-format
ENV TORCH_CUDA_ARCH_LIST="8.0 8.6 8.9 9.0"

#Installing lammps
WORKDIR /opt
RUN cd /opt
RUN git clone -b stable_29Aug2024_update1 https://github.com/lammps/lammps  
RUN git clone -b multicut https://github.com/mir-group/pair_allegro.git pair_allegro
RUN cd /opt/pair_allegro && \
    ./patch_lammps.sh /opt/lammps
RUN git clone https://github.com/mir-group/pair_nequip
RUN cd /opt/pair_nequip && \
    cp /opt/pair_nequip/pair_nequip.cpp /opt/lammps/src  && \
    cp /opt/pair_nequip/pair_nequip.h /opt/lammps/src

RUN apt-get install --yes clang-format xxd
RUN wget https://download.pytorch.org/libtorch/cu118/libtorch-shared-with-deps-2.5.1%2Bcu118.zip && \
    unzip libtorch-shared-with-deps-2.5.1+cu118.zip                         && \
    rm -r libtorch-shared-with-deps-2.5.1+cu118.zip                         && \
    mv libtorch libtorch-gpu                                                
ENV PATH=$PATH:/opt/libtorch-gpu/bin
ENV PATH=$PATH:/opt/libtorch-gpu/include
ENV LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/libtorch-gpu/lib


ENV PATH=$PATH:/opt/lammps/build/plumed_build-prefix/bin
ENV PATH=$PATH:/opt/lammps/build/plumed_build-prefix/include
ENV LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/lammps/build/plumed_build-prefix/lib
ENV PKG_CONFIG_PATH=$PKG_CONFIG_PATH:/opt/lammps/build/plumed_build-prefix/lib/pkgconfig
ENV PLUMED_KERNEL=/opt/lammps/build/plumed_build-prefix/lib/libplumedKernel.so
WORKDIR /opt/lammps
RUN mkdir build
WORKDIR /opt/lammps/build
RUN cmake -DMKL_INCLUDE_DIR=/opt/intel/oneapi/mkl/2025.0/include -DMKL_LIBRARY=/opt/intel/oneapi/mkl/2025.0/lib \
          -D CMAKE_BUILD_TYPE=Release -D CMAKE_PREFIX_PATH=/opt/libtorch-gpu \
          -D CMAKE_INSTALL_PREFIX=/opt/lammps/install -D CMAKE_CXX_STANDARD=17 -D CMAKE_CXX_STANDARD_REQUIRED=ON \
          -D BUILD_MPI=ON -D CMAKE_CXX_COMPILER=/opt/lammps/lib/kokkos/bin/nvcc_wrapper -D BUILD_SHARED_LIBS=ON \
          -D PKG_MANYBODY=ON -D PKG_MOLECULE=ON -D PKG_KSPACE=ON -D PKG_REPLICA=ON -D PKG_REAXFF=ON -D PKG_QEQ=ON \
          -D PKG_PHONON=ON -D PKG_ELECTRODE=yes -D PKG_PLUMED=yes -D DOWNLOAD_PLUMED=yes -D PLUMED_MODE=shared \
          -D PKG_KOKKOS=yes -D Kokkos_ARCH_AMPERE80=ON -D Kokkos_ENABLE_CUDA=yes \
          -D BUILD_SHARED_LIBS=ON \
          -D CMAKE_PREFIX_PATH="/opt/libtorch-gpu;/usr/local/cuda-11.8/compat;/opt/intel/oneapi/mkl/2025.0" ../cmake
ENV LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-11.8/compat
ENV LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/intel/oneapi/mkl/2025.0/lib
RUN make verbose=1 -j 16
RUN make install
ENV PATH=/opt/lammps/install/bin:$PATH
ENV PATH=/opt/lammps/install/lib:$PATH
ENV PATH=/opt/lammps/install/include:$PATH
ENV LD_LIBRARY_PATH=/opt/lammps/install/lib:$LD_LIBRARY_PATH

However, my build fails with the following error:

[100%] Built target lammps
[100%] Building CXX object CMakeFiles/lmp.dir/opt/lammps/src/main.cpp.o
[100%] Linking CXX executable lmp
/usr/bin/ld: liblammps.so.0: undefined reference to `Kokkos::Cuda::fence(std::string const&) const'
/usr/bin/ld: liblammps.so.0: undefined reference to `Kokkos::Impl::fill_host_accessible_header_info(Kokkos::Impl::SharedAllocationRecord<void, void>*, Kokkos::Impl::SharedAllocationHeader&, std::string const&)'
/usr/bin/ld: liblammps.so.0: undefined reference to `Kokkos::Impl::log_warning(std::string const&)'
/usr/bin/ld: liblammps.so.0: undefined reference to `Kokkos::Impl::throw_runtime_exception(std::string const&)'
/usr/bin/ld: liblammps.so.0: undefined reference to `Kokkos::fence(std::string const&)'
/usr/bin/ld: liblammps.so.0: undefined reference to `Kokkos::Tools::beginParallelFor(std::string const&, unsigned int, unsigned long*)'
/usr/bin/ld: liblammps.so.0: undefined reference to `Kokkos::Profiling::beginDeepCopy(Kokkos_Profiling_SpaceHandle, std::string, void const*, Kokkos_Profiling_SpaceHandle, std::string, void const*, unsigned long)'
/usr/bin/ld: liblammps.so.0: undefined reference to `Kokkos::Impl::safe_throw_allocation_with_header_failure(std::string const&, std::string const&, Kokkos::Experimental::RawMemoryAllocationFailure const&)'
/usr/bin/ld: liblammps.so.0: undefined reference to `Kokkos::Impl::SharedAllocationRecordCommon<Kokkos::HostSpace>::SharedAllocationRecordCommon(Kokkos::HostSpace const&, std::string const&, unsigned long, void (*)(Kokkos::Impl::SharedAllocationRecord<void, void>*))'
/usr/bin/ld: liblammps.so.0: undefined reference to `Kokkos::Tools::modifyDualView(std::string const&, void const*, bool)'
/usr/bin/ld: liblammps.so.0: undefined reference to `Kokkos::Tools::beginFence(std::string, unsigned int, unsigned long*)'
/usr/bin/ld: liblammps.so.0: undefined reference to `Kokkos::Tools::beginParallelScan(std::string const&, unsigned int, unsigned long*)'
/usr/bin/ld: liblammps.so.0: undefined reference to `Kokkos::Impl::SharedAllocationRecord<void, void>::SharedAllocationRecord(Kokkos::Impl::SharedAllocationHeader*, unsigned long, void (*)(Kokkos::Impl::SharedAllocationRecord<void, void>*), std::string const&)'
/usr/bin/ld: liblammps.so.0: undefined reference to `Kokkos::Tools::beginParallelReduce(std::string const&, unsigned int, unsigned long*)'
/usr/bin/ld: liblammps.so.0: undefined reference to `Kokkos::Impl::SharedAllocationRecordCommon<Kokkos::CudaHostPinnedSpace>::allocate_tracked(Kokkos::CudaHostPinnedSpace const&, std::string const&, unsigned long)'
/usr/bin/ld: liblammps.so.0: undefined reference to `Kokkos::Impl::HostInaccessibleSharedAllocationRecordCommon<Kokkos::CudaSpace>::get_label() const'
/usr/bin/ld: liblammps.so.0: undefined reference to `Kokkos::Impl::HostInaccessibleSharedAllocationRecordCommon<Kokkos::CudaSpace>::HostInaccessibleSharedAllocationRecordCommon(Kokkos::CudaSpace const&, std::string const&, unsigned long, void (*)(Kokkos::Impl::SharedAllocationRecord<void, void>*))'
/usr/bin/ld: liblammps.so.0: undefined reference to `Kokkos::Impl::SharedAllocationRecordCommon<Kokkos::HostSpace>::get_label() const'
/usr/bin/ld: liblammps.so.0: undefined reference to `Kokkos::Tools::syncDualView(std::string const&, void const*, bool)'
/usr/bin/ld: liblammps.so.0: undefined reference to `Kokkos::Profiling::beginParallelFor(std::string const&, unsigned int, unsigned long*)'
collect2: error: ld returned 1 exit status
make[2]: *** [CMakeFiles/lmp.dir/build.make:110: lmp] Error 1
make[1]: *** [CMakeFiles/Makefile2:416: CMakeFiles/lmp.dir/all] Error 2
make: *** [Makefile:136: all] Error 2
Error: building at STEP "RUN make verbose=1 -j 16": while running runtime: exit status 2

May I please request your help with this?

Thanks!
Neil

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant