diff --git a/docs/how-to/3rd-party/index.rst b/docs/how-to/3rd-party/index.rst new file mode 100644 index 00000000..390b42a5 --- /dev/null +++ b/docs/how-to/3rd-party/index.rst @@ -0,0 +1,17 @@ +Deep learning guide +################### + +The following sections cover the different framework installations for ROCm and +deep-learning applications. The following image provides +the sequential flow for the use of each framework. Refer to the ROCm Compatible +Frameworks Release Notes for each framework's most current release notes at +:doc:`/reference/3rd-party-support-matrix` + +.. image:: /data/install/magma-install/magma005.png + +Frameworks installation +*********************** + +- :doc:`pytorch-install` +- :doc:`tensorflow-install` +- :doc:`magma-install` diff --git a/docs/how-to/3rd-party/magma-install.md b/docs/how-to/3rd-party/magma-install.md deleted file mode 100644 index e0be3094..00000000 --- a/docs/how-to/3rd-party/magma-install.md +++ /dev/null @@ -1,64 +0,0 @@ -# MAGMA installation for ROCm - -## MAGMA for ROCm - -Matrix Algebra on GPU and Multicore Architectures (MAGMA) is a -collection of next-generation dense linear algebra libraries that is designed -for heterogeneous architectures, such as multiple GPUs and multi- or many-core -CPUs. - -MAGMA provides implementations for CUDA, HIP, Intel Xeon Phi, and OpenCL™. For -more information, refer to -[https://icl.utk.edu/magma/index.html](https://icl.utk.edu/magma/index.html). - -### Using MAGMA for PyTorch - -Tensor is fundamental to deep-learning techniques because it provides extensive -representational functionalities and math operations. This data structure is -represented as a multidimensional matrix. MAGMA accelerates tensor operations -with a variety of solutions including driver routines, computational routines, -BLAS routines, auxiliary routines, and utility routines. - -### Building MAGMA from source - -To build MAGMA from the source, follow these steps: - -1. In the event you want to compile only for your uarch, use: - - ```bash - export PYTORCH_ROCM_ARCH= - ``` - - `` is the architecture reported by the `rocminfo` command. - -2. Use the following: - - ```bash - export PYTORCH_ROCM_ARCH= - - # "install" hipMAGMA into /opt/rocm/magma by copying after build - git clone https://bitbucket.org/icl/magma.git - pushd magma - # Fixes memory leaks of MAGMA found while executing linalg UTs - git checkout 5959b8783e45f1809812ed96ae762f38ee701972 - cp make.inc-examples/make.inc.hip-gcc-mkl make.inc - echo 'LIBDIR += -L$(MKLROOT)/lib' >> make.inc - echo 'LIB += -Wl,--enable-new-dtags -Wl,--rpath,/opt/rocm/lib -Wl,--rpath,$(MKLROOT)/lib -Wl,--rpath,/opt/rocm/magma/lib' >> make.inc - echo 'DEVCCFLAGS += --gpu-max-threads-per-block=256' >> make.inc - export PATH="${PATH}:/opt/rocm/bin" - if [[ -n "$PYTORCH_ROCM_ARCH" ]]; then - amdgpu_targets=`echo $PYTORCH_ROCM_ARCH | sed 's/;/ /g'` - else - amdgpu_targets=`rocm_agent_enumerator | grep -v gfx000 | sort -u | xargs` - fi - for arch in $amdgpu_targets; do - echo "DEVCCFLAGS += --amdgpu-target=$arch" >> make.inc - done - # hipcc with openmp flag may cause isnan() on __device__ not to be found; depending on context, compiler may attempt to match with host definition - sed -i 's/^FOPENMP/#FOPENMP/g' make.inc - make -f make.gen.hipMAGMA -j $(nproc) - LANG=C.UTF-8 make lib/libmagma.so -j $(nproc) MKLROOT=/opt/conda - make testing/testing_dgemm -j $(nproc) MKLROOT=/opt/conda - popd - mv magma /opt/rocm - ``` diff --git a/docs/how-to/3rd-party/magma-install.rst b/docs/how-to/3rd-party/magma-install.rst new file mode 100644 index 00000000..b3f7b6a0 --- /dev/null +++ b/docs/how-to/3rd-party/magma-install.rst @@ -0,0 +1,68 @@ +MAGMA installation for ROCm +########################### + +MAGMA for ROCm +************** + +Matrix Algebra on GPU and Multicore Architectures (MAGMA) is a +collection of next-generation dense linear algebra libraries that is designed +for heterogeneous architectures, such as multiple GPUs and multi- or many-core +CPUs. + +MAGMA provides implementations for CUDA, HIP, Intel Xeon Phi, and OpenCL™. For +more information, refer to +`https://icl.utk.edu/magma/index.html `_. + +Using MAGMA for PyTorch +======================= + +Tensor is fundamental to deep-learning techniques because it provides extensive +representational functionalities and math operations. This data structure is +represented as a multidimensional matrix. MAGMA accelerates tensor operations +with a variety of solutions including driver routines, computational routines, +BLAS routines, auxiliary routines, and utility routines. + +Building MAGMA from source +========================== + +To build MAGMA from the source, follow these steps: + +1. In the event you want to compile only for your uarch, use: + + .. code-block:: shell + + export PYTORCH_ROCM_ARCH= + + ```` is the architecture reported by the ``rocminfo`` command. + +2. Use the following: + + .. code-block:: shell + + export PYTORCH_ROCM_ARCH= + + # "install" hipMAGMA into /opt/rocm/magma by copying after build + git clone https://bitbucket.org/icl/magma.git + pushd magma + # Fixes memory leaks of MAGMA found while executing linalg UTs + git checkout 5959b8783e45f1809812ed96ae762f38ee701972 + cp make.inc-examples/make.inc.hip-gcc-mkl make.inc + echo 'LIBDIR += -L$(MKLROOT)/lib' >> make.inc + echo 'LIB += -Wl,--enable-new-dtags -Wl,--rpath,/opt/rocm/lib -Wl,--rpath,$(MKLROOT)/lib -Wl,--rpath,/opt/rocm/magma/lib' >> make.inc + echo 'DEVCCFLAGS += --gpu-max-threads-per-block=256' >> make.inc + export PATH="${PATH}:/opt/rocm/bin" + if [[ -n "$PYTORCH_ROCM_ARCH" ]]; then + amdgpu_targets=`echo $PYTORCH_ROCM_ARCH | sed 's/;/ /g'` + else + amdgpu_targets=`rocm_agent_enumerator | grep -v gfx000 | sort -u | xargs` + fi + for arch in $amdgpu_targets; do + echo "DEVCCFLAGS += --amdgpu-target=$arch" >> make.inc + done + # hipcc with openmp flag may cause isnan() on __device__ not to be found; depending on context, compiler may attempt to match with host definition + sed -i 's/^FOPENMP/#FOPENMP/g' make.inc + make -f make.gen.hipMAGMA -j $(nproc) + LANG=C.UTF-8 make lib/libmagma.so -j $(nproc) MKLROOT=/opt/conda + make testing/testing_dgemm -j $(nproc) MKLROOT=/opt/conda + popd + mv magma /opt/rocm diff --git a/docs/how-to/3rd-party/pytorch-install.md b/docs/how-to/3rd-party/pytorch-install.md deleted file mode 100644 index e4ae7561..00000000 --- a/docs/how-to/3rd-party/pytorch-install.md +++ /dev/null @@ -1,446 +0,0 @@ -# Installing PyTorch for ROCm - -[PyTorch](https://pytorch.org/) is an open-source tensor library designed for deep learning. PyTorch on -ROCm provides mixed-precision and large-scale training using our -[MIOpen](https://github.com/ROCmSoftwarePlatform/MIOpen) and -[RCCL](https://github.com/ROCmSoftwarePlatform/rccl) libraries. - -To install [PyTorch for ROCm](https://pytorch.org/blog/pytorch-for-amd-rocm-platform-now-available-as-python-package/), you have the following options: - -* [Use a Docker image with PyTorch pre-installed](#using-a-docker-image-with-pytorch-pre-installed) - (recommended) -* [Use a wheels package](#using-a-wheels-package) -* [Use the PyTorch ROCm base Docker image](#using-the-pytorch-rocm-base-docker-image) -* [Use the PyTorch upstream Docker file](#using-the-pytorch-upstream-docker-file) - -For hardware, software, and third-party framework compatibility between ROCm and PyTorch, refer to: - -* [GPU and OS support (Linux)](../about/compatibility/linux-support.md) -* [Compatibility](../about/compatibility/3rd-party-support-matrix.md) - -## Using a Docker image with PyTorch pre-installed - -1. Download the latest public PyTorch Docker image - ([https://hub.docker.com/r/rocm/pytorch](https://hub.docker.com/r/rocm/pytorch)). - - ```bash - docker pull rocm/pytorch:latest - ``` - - You can also download a specific and supported configuration with different user-space ROCm - versions, PyTorch versions, and operating systems. - -2. Start a Docker container using the image. - - ```bash - docker run -it --cap-add=SYS_PTRACE --security-opt seccomp=unconfined \ - --device=/dev/kfd --device=/dev/dri --group-add video \ - --ipc=host --shm-size 8G rocm/pytorch:latest - ``` - - :::{note} - This will automatically download the image if it does not exist on the host. You can also pass the `-v` - argument to mount any data directories from the host onto the container. - ::: - -(install_pytorch_wheels)= - -## Using a wheels package - -PyTorch supports the ROCm platform by providing tested wheels packages. To access this feature, go -to [https://pytorch.org/get-started/locally/](https://pytorch.org/get-started/locally/). For the correct -wheels command, you must select 'Linux', 'Python', 'pip', and 'ROCm' in the matrix. - -1. Choose one of the following three options: - - **Option 1:** - - a. Download a base Docker image with the correct user-space ROCm version. - | Base OS | Docker image | Link to Docker image| - |----------------|-----------------------------|----------------| - | Ubuntu 20.04 | `rocm/dev-ubuntu-20.04` | [https://hub.docker.com/r/rocm/dev-ubuntu-20.04](https://hub.docker.com/r/rocm/dev-ubuntu-20.04) - | Ubuntu 22.04 | `rocm/dev-ubuntu-22.04` | [https://hub.docker.com/r/rocm/dev-ubuntu-22.04](https://hub.docker.com/r/rocm/dev-ubuntu-22.04) - | CentOS 7 | `rocm/dev-centos-7` | [https://hub.docker.com/r/rocm/dev-centos-7](https://hub.docker.com/r/rocm/dev-centos-7) - - b. Pull the selected image. - - ```bash - docker pull rocm/dev-ubuntu-20.04:latest - ``` - - c. Start a Docker container using the downloaded image. - - ```bash - docker run -it --device=/dev/kfd --device=/dev/dri --group-add video rocm/dev-ubuntu-20.04:latest - ``` - - **Option 2:** - - Select a base OS Docker image (Check [OS compatibility](../about/compatibility/linux-support.md)) - - Pull selected base OS image (Ubuntu 20.04 for example) - - ```docker - docker pull ubuntu:20.04 - ``` - - Start a Docker container using the downloaded image - - ```docker - docker run -it --device=/dev/kfd --device=/dev/dri --group-add video ubuntu:20.04 - ``` - - Install ROCm using the directions in the [Installation section](./linux/install.md). - - **Option 3:** - - Install on bare metal. Check [OS compatibility](../about/compatibility/linux-support.md) and install ROCm using the - directions in the [Installation section](./linux/install.md). - -2. Install the required dependencies for the wheels package. - - ```bash - sudo apt update - sudo apt install libjpeg-dev python3-dev python3-pip - pip3 install wheel setuptools - ``` - -3. Install `torch`, `torchvision`, and `torchaudio`, as specified in the - [installation matrix](https://pytorch.org/get-started/locally/). - - :::{note} - The following command uses the ROCm 5.6 PyTorch wheel. If you want a different version of ROCm, - modify the command accordingly. - ::: - - ```bash - pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm5.6/ - ``` - -4. (Optional) Use MIOpen kdb files with ROCm PyTorch wheels. - - PyTorch uses [MIOpen](https://github.com/ROCmSoftwarePlatform/MIOpen) for machine learning - primitives, which are compiled into kernels at runtime. Runtime compilation causes a small warm-up - phase when starting PyTorch, and MIOpen kdb files contain precompiled kernels that can speed up - application warm-up phases. For more information, refer to the - {doc}`MIOpen installation page `. - - MIOpen kdb files can be used with ROCm PyTorch wheels. However, the kdb files need to be placed in - a specific location with respect to the PyTorch installation path. A helper script simplifies this task by - taking the ROCm version and GPU architecture as inputs. This works for Ubuntu and CentOS. - - You can download the helper script here: - [install_kdb_files_for_pytorch_wheels.sh](https://raw.githubusercontent.com/wiki/ROCmSoftwarePlatform/pytorch/files/ install_kdb_files_for_pytorch_wheels.sh), or use: - - `wget https://raw.githubusercontent.com/wiki/ROCmSoftwarePlatform/pytorch/files/install_kdb_files_for_pytorch_wheels.sh` - - After installing ROCm PyTorch wheels, run the following code: - - ```bash - #Optional; replace 'gfx90a' with your architecture and 5.6 with your preferred ROCm version - export GFX_ARCH=gfx90a - - #Optional - export ROCM_VERSION=5.6 - - ./install_kdb_files_for_pytorch_wheels.sh - ``` - -## Using the PyTorch ROCm base Docker image - -The pre-built base Docker image has all dependencies installed, including: - -* ROCm -* Torchvision -* Conda packages -* The compiler toolchain - -Additionally, a particular environment flag (`BUILD_ENVIRONMENT`) is set, which is used by the build -scripts to determine the configuration of the build environment. - -1. Download the Docker image. This is the base image, which does not contain PyTorch. - - ```bash - docker pull rocm/pytorch:latest-base - ``` - -2. Start a Docker container using the downloaded image. - - ```bash - docker run -it --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --device=/dev/kfd --device=/dev/dri --group-add video --ipc=host --shm-size 8G rocm/pytorch:latest-base - ``` - - You can also pass the `-v` argument to mount any data directories from the host onto the container. - -3. Clone the PyTorch repository. - - ```bash - cd ~ - git clone https://github.com/pytorch/pytorch.git - cd /pytorch - git submodule update --init --recursive - ``` - -4. Set ROCm architecture (optional). The Docker image tag is `rocm/pytorch:latest-base`. - - :::{note} - By default in the `rocm/pytorch:latest-base` image, PyTorch builds simultaneously for the following - architectures: - * gfx900 - * gfx906 - * gfx908 - * gfx90a - * gfx1030 - ::: - - If you want to compile _only_ for your microarchitecture (uarch), run: - - ```bash - export PYTORCH_ROCM_ARCH= - ``` - - Where `` is the architecture reported by the `rocminfo` command. - - To find your uarch, run: - - ```bash - rocminfo | grep gfx - ``` - -5. Build PyTorch. - - ```bash - ./.ci/pytorch/build.sh - ``` - - This converts PyTorch sources for - [HIP compatibility](https://www.amd.com/en/developer/rocm-hub/hip-sdk.html) and builds the - PyTorch framework. - - To check if your build is successful, run: - - ```bash - echo $? # should return 0 if success - ``` - -## Using the PyTorch upstream Docker file - -If you don't want to use a prebuilt base Docker image, you can build a custom base Docker image -using scripts from the PyTorch repository. This uses a standard Docker image from operating system -maintainers and installs all the required dependencies, including: - -* ROCm -* Torchvision -* Conda packages -* The compiler toolchain - -1. Clone the PyTorch repository. - - ```bash - cd ~ - git clone https://github.com/pytorch/pytorch.git - cd /pytorch - git submodule update --init --recursive - ``` - -2. Build the PyTorch Docker image. - - ```bash - cd .ci/docker - ./build.sh pytorch-linux--rocm-py -t rocm/pytorch:build_from_dockerfile - ``` - - Where: - * ``: `ubuntu20.04` (or `focal`), `ubuntu22.04` (or `jammy`), `centos7.5`, or `centos9` - * ``: `5.4`, `5.5`, or `5.6` - * ``: `3.8`-`3.11` - - To verify that your image was successfully created, run: - - `docker image ls rocm/pytorch:build_from_dockerfile` - - If successful, the output looks like this: - - ```bash - REPOSITORY TAG IMAGE ID CREATED SIZE - rocm/pytorch build_from_dockerfile 17071499be47 2 minutes ago 32.8GB - ``` - -3. Start a Docker container using the image with the mounted PyTorch folder. - - ```bash - docker run -it --cap-add=SYS_PTRACE --security-opt --user root \ - seccomp=unconfined --device=/dev/kfd --device=/dev/dri \ - --group-add video --ipc=host --shm-size 8G \ - -v ~/pytorch:/pytorch rocm/pytorch:build_from_dockerfile - ``` - - You can also pass the `-v` argument to mount any data directories from the host onto the container. - -4. Go to the PyTorch directory. - - ```bash - cd pytorch - ``` - -5. Set ROCm architecture. - - To determine your AMD architecture, run: - - ```bash - rocminfo | grep gfx - ``` - - The result looks like this (for `gfx1030` architecture): - - ```bash - Name: gfx1030 - Name: amdgcn-amd-amdhsa--gfx1030 - ``` - - Set the `PYTORCH_ROCM_ARCH` environment variable to specify the architectures you want to - build PyTorch for. - - ```bash - export PYTORCH_ROCM_ARCH= - ``` - - where `` is the architecture reported by the `rocminfo` command. - -6. Build PyTorch. - - ```bash - ./.ci/pytorch/build.sh - ``` - - This converts PyTorch sources for - [HIP compatibility](https://www.amd.com/en/developer/rocm-hub/hip-sdk.html) and builds the - PyTorch framework. - - To check if your build is successful, run: - - ```bash - echo $? # should return 0 if success - ``` - -## Testing the PyTorch installation - -You can use PyTorch unit tests to validate your PyTorch installation. If you used a -**prebuilt PyTorch Docker image from AMD ROCm DockerHub** or installed an -**official wheels package**, validation tests are not necessary. - -If you want to manually run unit tests to validate your PyTorch installation fully, follow these steps: - -1. Import the torch package in Python to test if PyTorch is installed and accessible. - - :::{note} - Do not run the following command in the PyTorch git folder. - ::: - - ```bash - python3 -c 'import torch' 2> /dev/null && echo 'Success' || echo 'Failure' - ``` - -2. Check if the GPU is accessible from PyTorch. In the PyTorch framework, `torch.cuda` is a generic way - to access the GPU. This can only access an AMD GPU if one is available. - - ```bash - python3 -c 'import torch; print(torch.cuda.is_available())' - ``` - -3. Run unit tests to validate the PyTorch installation fully. - - :::{note} - You must run the following command from the PyTorch home directory. - ::: - - ```bash - PYTORCH_TEST_WITH_ROCM=1 python3 test/run_test.py --verbose \ - --include test_nn test_torch test_cuda test_ops \ - test_unary_ufuncs test_binary_ufuncs test_autograd - ``` - - This command ensures that the required environment variable is set to skip certain unit tests for - ROCm. This also applies to wheel installs in a non-controlled environment. - - :::{note} - Make sure your PyTorch source code corresponds to the PyTorch wheel or the installation in the - Docker image. Incompatible PyTorch source code can give errors when running unit tests. - ::: - - Some tests may be skipped, as appropriate, based on your system configuration. ROCm doesn't - support all PyTorch features; tests that evaluate unsupported features are skipped. Other tests might - be skipped, depending on the host or GPU memory and the number of available GPUs. - - If the compilation and installation are correct, all tests will pass. - -4. Run individual unit tests. - - ```bash - PYTORCH_TEST_WITH_ROCM=1 python3 test/test_nn.py --verbose - ``` - - You can replace `test_nn.py` with any other test set. - -## Running a basic PyTorch example - -The PyTorch examples repository provides basic examples that exercise the functionality of your -framework. - -Two of our favorite testing databases are: - -* **MNIST** (Modified National Institute of Standards and Technology): A database of handwritten - digits that can be used to train a Convolutional Neural Network for **handwriting recognition**. -* **ImageNet**: A database of images that can be used to train a network for - **visual object recognition**. - -### MNIST PyTorch example - -1. Clone the PyTorch examples repository. - - ```bash - git clone https://github.com/pytorch/examples.git - ``` - -2. Go to the MNIST example folder. - - ```bash - cd examples/mnist - ``` - -3. Follow the instructions in the `README.md`` file in this folder to install the requirements. Then run: - - ```bash - python3 main.py - ``` - - This generates the following output: - - ```bash - ... - Train Epoch: 14 [58240/60000 (97%)] Loss: 0.010128 - Train Epoch: 14 [58880/60000 (98%)] Loss: 0.001348 - Train Epoch: 14 [59520/60000 (99%)] Loss: 0.005261 - - Test set: Average loss: 0.0252, Accuracy: 9921/10000 (99%) - ``` - -### ImageNet PyTorch example - -1. Clone the PyTorch examples repository (if you didn't already do this step in the preceding MNIST example). - - ```bash - git clone https://github.com/pytorch/examples.git - ``` - -2. Go to the ImageNet example folder. - - ```bash - cd examples/imagenet - ``` - -3. Follow the instructions in the `README.md` file in this folder to install the Requirements. Then run: - - ```bash - python3 main.py - ``` diff --git a/docs/how-to/3rd-party/pytorch-install.rst b/docs/how-to/3rd-party/pytorch-install.rst new file mode 100644 index 00000000..2bd5bd17 --- /dev/null +++ b/docs/how-to/3rd-party/pytorch-install.rst @@ -0,0 +1,473 @@ +Installing PyTorch for ROCm +########################### + +`PyTorch `_ is an open-source tensor library designed for deep learning. PyTorch on +ROCm provides mixed-precision and large-scale training using our +`MIOpen `_ and +`RCCL `_ libraries. + +To install `PyTorch for ROCm `_, you have the following options: + +* :ref:`using-docker-with-pytorch-pre-installed` + (recommended) +* :ref:`using-wheels-package` +* :ref:`using-pytorch-rocm-docker-image` +* :ref:`using-pytorch-upstream-docker-image` + +For hardware, software, and third-party framework compatibility between ROCm and PyTorch, refer to: + +* :doc:`/reference/system-requirements` +* :doc:`/reference/3rd-party-support-matrix` + +.. _using-docker-with-pytorch-pre-installed: + +Using a Docker image with PyTorch pre-installed +*********************************************** + +1. Download the latest public `PyTorch Docker image `_. + + .. code-block:: bash + + docker pull rocm/pytorch:latest + + You can also download a specific and supported configuration with different user-space ROCm + versions, PyTorch versions, and operating systems. + +2. Start a Docker container using the image. + + .. code-block:: bash + + docker run -it --cap-add=SYS_PTRACE --security-opt seccomp=unconfined \ + --device=/dev/kfd --device=/dev/dri --group-add video \ + --ipc=host --shm-size 8G rocm/pytorch:latest + + .. note:: + + This will automatically download the image if it does not exist on the host. You can also pass the ``-v`` + argument to mount any data directories from the host onto the container. + +.. _install_pytorch_wheels: +.. _using-wheels-package: + +Using a wheels package +********************** + +PyTorch supports the ROCm platform by providing tested wheels packages. To access this feature, go +to `pytorch.org/get-started/locally/ `_. For the correct +wheels command, you must select 'Linux', 'Python', 'pip', and 'ROCm' in the matrix. + +1. Choose one of the following three options: + + **Option 1:** + + a. Download a base Docker image with the correct user-space ROCm version. + + .. list-table:: + :header-rows: 1 + + * - Base OS + - Docker Image + * - Ubuntu 20.04 + - `rocm/dev-ubuntu-20.04 ` + * - Ubuntu 22.04 + - `rocm/dev-ubuntu-20.04 ` + * - CentOS 7 + - `rocm/dev-centos-7 ` + + b. Pull the selected image. + + .. code-block:: bash + + docker pull rocm/dev-ubuntu-20.04:latest + + c. Start a Docker container using the downloaded image. + + .. code-block:: bash + + docker run -it --device=/dev/kfd --device=/dev/dri --group-add video rocm/dev-ubuntu-20.04:latest + + **Option 2:** + + Select a base OS Docker image (Check :doc:`/reference/system-requirements`) + + Pull selected base OS image (Ubuntu 20.04 for example) + + .. code-block:: bash + + docker pull ubuntu:20.04 + + Start a Docker container using the downloaded image + + .. code-block:: bash + + docker run -it --device=/dev/kfd --device=/dev/dri --group-add video ubuntu:20.04 + + Install ROCm using the directions in the :doc:`Installation section `. + + **Option 3:** + + Install on bare metal. Check :doc:`/reference/system-requirements` and install ROCm using the + directions in the :doc:`Installation section `. + +2. Install the required dependencies for the wheels package. + + .. code-block:: bash + + sudo apt update + sudo apt install libjpeg-dev python3-dev python3-pip + pip3 install wheel setuptools + +3. Install ``torch``, ``torchvision``, and ``torchaudio``, as specified in the + `installation matrix `. + + .. note:: + + The following command uses the ROCm 5.6 PyTorch wheel. If you want a different version of ROCm, + modify the command accordingly. + + .. code-block:: bash + + pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm5.6/ + +4. (Optional) Use MIOpen kdb files with ROCm PyTorch wheels. + + PyTorch uses `MIOpen `_ for machine learning + primitives, which are compiled into kernels at runtime. Runtime compilation causes a small warm-up + phase when starting PyTorch, and MIOpen kdb files contain precompiled kernels that can speed up + application warm-up phases. For more information, refer to the + {doc}`MIOpen installation page `. + + MIOpen kdb files can be used with ROCm PyTorch wheels. However, the kdb files need to be placed in + a specific location with respect to the PyTorch installation path. A helper script simplifies this task by + taking the ROCm version and GPU architecture as inputs. This works for Ubuntu and CentOS. + + You can download the helper script here: + `install_kdb_files_for_pytorch_wheels.sh `_, or use: + + .. code-block:: bash + + wget https://raw.githubusercontent.com/wiki/ROCmSoftwarePlatform/pytorch/files/install_kdb_files_for_pytorch_wheels.sh + + After installing ROCm PyTorch wheels, run the following code: + + .. code-block:: bash + + #Optional; replace 'gfx90a' with your architecture and 5.6 with your preferred ROCm version + export GFX_ARCH=gfx90a + + #Optional + export ROCM_VERSION=5.6 + + ./install_kdb_files_for_pytorch_wheels.sh + +.. _using-pytorch-rocm-docker-image: + +Using the PyTorch ROCm base Docker image +**************************************** + +The pre-built base Docker image has all dependencies installed, including: + +* ROCm +* Torchvision +* Conda packages +* The compiler toolchain + +Additionally, a particular environment flag (``BUILD_ENVIRONMENT``) is set, which is used by the build +scripts to determine the configuration of the build environment. + +1. Download the Docker image. This is the base image, which does not contain PyTorch. + + .. code-block:: bash + + docker pull rocm/pytorch:latest-base + +2. Start a Docker container using the downloaded image. + + .. code-block:: bash + + docker run -it --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --device=/dev/kfd --device=/dev/dri --group-add video --ipc=host --shm-size 8G rocm/pytorch:latest-base + + You can also pass the ``-v`` argument to mount any data directories from the host onto the container. + +3. Clone the PyTorch repository. + + .. code-block:: bash + + cd ~ + git clone https://github.com/pytorch/pytorch.git + cd /pytorch + git submodule update --init --recursive + +4. Set ROCm architecture (optional). The Docker image tag is ``rocm/pytorch:latest-base``. + + .. note:: + + By default in the ``rocm/pytorch:latest-base`` image, PyTorch builds simultaneously for the following + architectures: + * gfx900 + * gfx906 + * gfx908 + * gfx90a + * gfx1030 + + If you want to compile *only* for your microarchitecture (uarch), run: + + .. code-block:: bash + + export PYTORCH_ROCM_ARCH= + + Where ```` is the architecture reported by the ``rocminfo`` command. + + To find your uarch, run: + + .. code-block:: bash + + rocminfo | grep gfx + +5. Build PyTorch. + + .. code-block:: bash + + ./.ci/pytorch/build.sh + + This converts PyTorch sources for + [HIP compatibility](https://www.amd.com/en/developer/rocm-hub/hip-sdk.html) and builds the + PyTorch framework. + + To check if your build is successful, run: + + .. code-block:: bash + + echo $? # should return 0 if success + +.. _using-pytorch-upstream-docker-image: + +Using the PyTorch upstream Docker file +************************************** + +If you don't want to use a prebuilt base Docker image, you can build a custom base Docker image +using scripts from the PyTorch repository. This uses a standard Docker image from operating system +maintainers and installs all the required dependencies, including: + +* ROCm +* Torchvision +* Conda packages +* The compiler toolchain + +1. Clone the PyTorch repository. + + .. code-block:: bash + + cd ~ + git clone https://github.com/pytorch/pytorch.git + cd /pytorch + git submodule update --init --recursive + +2. Build the PyTorch Docker image. + + .. code-block:: bash + + cd .ci/docker + ./build.sh pytorch-linux--rocm-py -t rocm/pytorch:build_from_dockerfile + + Where: + * ````: ``ubuntu20.04`` (or ``focal``), ``ubuntu22.04`` (or ``jammy``), ``centos7.5``, or ``centos9`` + * ````: ``5.4``, ``5.5``, or ``5.6`` + * ````: ``3.8`` - ``3.11`` + + To verify that your image was successfully created, run: + + .. code-block:: bash + + docker image ls rocm/pytorch:build_from_dockerfile + + If successful, the output looks like this: + + .. code-block:: bash + + REPOSITORY TAG IMAGE ID CREATED SIZE + rocm/pytorch build_from_dockerfile 17071499be47 2 minutes ago 32.8GB + +3. Start a Docker container using the image with the mounted PyTorch folder. + + .. code-block:: bash + + docker run -it --cap-add=SYS_PTRACE --security-opt --user root \ + seccomp=unconfined --device=/dev/kfd --device=/dev/dri \ + --group-add video --ipc=host --shm-size 8G \ + -v ~/pytorch:/pytorch rocm/pytorch:build_from_dockerfile + + You can also pass the ``-v`` argument to mount any data directories from the host onto the container. + +4. Go to the PyTorch directory. + + .. code-block:: bash + + cd pytorch + +5. Set ROCm architecture. + + To determine your AMD architecture, run: + + .. code-block:: bash + + rocminfo | grep gfx + + The result looks like this (for ``gfx1030`` architecture): + + .. code-block:: bash + + Name: gfx1030 + Name: amdgcn-amd-amdhsa--gfx1030 + + Set the ``PYTORCH_ROCM_ARCH`` environment variable to specify the architectures you want to + build PyTorch for. + + .. code-block:: bash + + export PYTORCH_ROCM_ARCH= + + where ```` is the architecture reported by the ``rocminfo`` command. + +6. Build PyTorch. + + .. code-block:: bash + + ./.ci/pytorch/build.sh + + This converts PyTorch sources for + `HIP compatibility `_ and builds the + PyTorch framework. + + To check if your build is successful, run: + + .. code-block:: bash + + echo $? # should return 0 if success + +Testing the PyTorch installation +******************************** + +You can use PyTorch unit tests to validate your PyTorch installation. If you used a +**prebuilt PyTorch Docker image from AMD ROCm DockerHub** or installed an +**official wheels package**, validation tests are not necessary. + +If you want to manually run unit tests to validate your PyTorch installation fully, follow these steps: + +1. Import the torch package in Python to test if PyTorch is installed and accessible. + + .. note:: + + Do not run the following command in the PyTorch git folder. + + .. code-block:: bash + + python3 -c 'import torch' 2> /dev/null && echo 'Success' || echo 'Failure' + +2. Check if the GPU is accessible from PyTorch. In the PyTorch framework, ``torch.cuda`` is a generic way + to access the GPU. This can only access an AMD GPU if one is available. + + .. code-block:: bash + + python3 -c 'import torch; print(torch.cuda.is_available())' + + +3. Run unit tests to validate the PyTorch installation fully. + + .. note:: + + You must run the following command from the PyTorch home directory. + + .. code-block:: bash + + PYTORCH_TEST_WITH_ROCM=1 python3 test/run_test.py --verbose \ + --include test_nn test_torch test_cuda test_ops \ + test_unary_ufuncs test_binary_ufuncs test_autograd + + This command ensures that the required environment variable is set to skip certain unit tests for + ROCm. This also applies to wheel installs in a non-controlled environment. + + .. note:: + + Make sure your PyTorch source code corresponds to the PyTorch wheel or the installation in the + Docker image. Incompatible PyTorch source code can give errors when running unit tests. + + Some tests may be skipped, as appropriate, based on your system configuration. ROCm doesn't + support all PyTorch features; tests that evaluate unsupported features are skipped. Other tests might + be skipped, depending on the host or GPU memory and the number of available GPUs. + + If the compilation and installation are correct, all tests will pass. + +4. Run individual unit tests. + + .. code-block:: bash + + PYTORCH_TEST_WITH_ROCM=1 python3 test/test_nn.py --verbose + + You can replace ``test_nn.py`` with any other test set. + +Running a basic PyTorch example +******************************* + +The PyTorch examples repository provides basic examples that exercise the functionality of your +framework. + +Two of our favorite testing databases are: + +* **MNIST** (Modified National Institute of Standards and Technology): A database of handwritten + digits that can be used to train a Convolutional Neural Network for **handwriting recognition**. +* **ImageNet**: A database of images that can be used to train a network for + **visual object recognition**. + +MNIST PyTorch example +===================== + +1. Clone the PyTorch examples repository. + + .. code-block:: bash + + git clone https://github.com/pytorch/examples.git + +2. Go to the MNIST example folder. + + .. code-block:: bash + + cd examples/mnist + +3. Follow the instructions in the ``README.md`` file in this folder to install the requirements. Then run: + + .. code-block:: bash + + python3 main.py + + This generates the following output: + + .. code-block:: + + ... + Train Epoch: 14 [58240/60000 (97%)] Loss: 0.010128 + Train Epoch: 14 [58880/60000 (98%)] Loss: 0.001348 + Train Epoch: 14 [59520/60000 (99%)] Loss: 0.005261 + + Test set: Average loss: 0.0252, Accuracy: 9921/10000 (99%) + +ImageNet PyTorch example +======================== + +1. Clone the PyTorch examples repository (if you didn't already do this step in the preceding MNIST example). + + .. code-block:: bash + + git clone https://github.com/pytorch/examples.git + +2. Go to the ImageNet example folder. + + .. code-block:: bash + + cd examples/imagenet + +3. Follow the instructions in the ``README.md`` file in this folder to install the Requirements. Then run: + + .. code-block:: bash + + python3 main.py diff --git a/docs/how-to/3rd-party/tensorflow-install.md b/docs/how-to/3rd-party/tensorflow-install.md deleted file mode 100644 index fc249c7f..00000000 --- a/docs/how-to/3rd-party/tensorflow-install.md +++ /dev/null @@ -1,191 +0,0 @@ -# Installing TensorFlow for ROCm - -## TensorFlow - -TensorFlow is an open-source library for solving machine-learning, -deep-learning, and artificial-intelligence problems. It can be used to solve -many problems across different sectors and industries but primarily focuses on -training and inference in neural networks. It is one of the most popular and -in-demand frameworks and is very active in open source contribution and -development. - -:::{warning} -ROCm 5.6 and 5.7 deviates from the standard practice of supporting the last three -TensorFlow versions. This is due to incompatibilities between earlier TensorFlow -versions and changes introduced in the ROCm 5.6 compiler. Refer to the following -version support matrix: - -| ROCm | TensorFlow | -|:-----:|:----------:| -| 5.6.x | 2.12 | -| 5.7.0 | 2.12, 2.13 | -| Post-5.7.0 | Last three versions at ROCm release. | -::: - -### Installing TensorFlow - -The following sections contain options for installing TensorFlow. - -#### Option 1: using a Docker image - -To install ROCm on bare metal, follow the section -[Linux installation guide](../install/linux/install.md). The recommended option to -get a TensorFlow environment is through Docker. - -Using Docker provides portability and access to a prebuilt Docker container that -has been rigorously tested within AMD. This might also save compilation time and -should perform as tested without facing potential installation issues. -Follow these steps: - -1. Pull the latest public TensorFlow Docker image. - - ```bash - docker pull rocm/tensorflow:latest - ``` - -2. Once you have pulled the image, run it by using the command below: - - ```bash - docker run -it --network=host --device=/dev/kfd --device=/dev/dri \ - --ipc=host --shm-size 16G --group-add video --cap-add=SYS_PTRACE \ - --security-opt seccomp=unconfined rocm/tensorflow:latest - ``` - -#### Option 2: using a wheels package - -To install TensorFlow using the wheels package, follow these steps: - -1. Check the Python version. - - ```bash - python3 --version - ``` - - | If: | Then: | - |:-----------------------------------:|:--------------------------------:| - | The Python version is less than 3.7 | Upgrade Python. | - | The Python version is more than 3.7 | Skip this step and go to Step 3. | - - ```{note} - The supported Python versions are: - - * 3.7 - * 3.8 - * 3.9 - * 3.10 - ``` - - ```bash - sudo apt-get install python3.7 # or python3.8 or python 3.9 or python 3.10 - ``` - -2. Set up multiple Python versions using update-alternatives. - - ```bash - update-alternatives --query python3 - sudo update-alternatives --install - /usr/bin/python3 python3 /usr/bin/python[version] [priority] - ``` - - ```{note} - Follow the instruction in Step 2 for incompatible Python versions. - ``` - - ```bash - sudo update-alternatives --config python3 - ``` - -3. Follow the screen prompts, and select the Python version installed in Step 2. - -4. Install or upgrade PIP. - - ```bash - sudo apt install python3-pip - ``` - - To install PIP, use the following: - - ```bash - /usr/bin/python[version] -m pip install --upgrade pip - ``` - - Upgrade PIP for Python version installed in step 2: - - ```bash - sudo pip3 install --upgrade pip - ``` - -5. Install TensorFlow for the Python version as indicated in Step 2. - - ```bash - /usr/bin/python[version] -m pip install --user tensorflow-rocm==[wheel-version] --upgrade - ``` - - For a valid wheel version for a ROCm release, refer to the instruction below: - - ```bash - sudo apt install rocm-libs rccl - ``` - -6. Update `protobuf` to 3.19 or lower. - - ```bash - /usr/bin/python3.7 -m pip install protobuf=3.19.0 - sudo pip3 install tensorflow - ``` - -7. Set the environment variable `PYTHONPATH`. - - ```bash - export PYTHONPATH="./.local/lib/python[version]/site-packages:$PYTHONPATH" #Use same python version as in step 2 - ``` - -8. Install libraries. - - ```bash - sudo apt install rocm-libs rccl - ``` - -9. Test installation. - - ```bash - python3 -c 'import tensorflow' 2> /dev/null && echo 'Success' || echo 'Failure' - ``` - - ```{note} - For details on `tensorflow-rocm` wheels and ROCm version compatibility, see: - [https://github.com/ROCmSoftwarePlatform/tensorflow-upstream/blob/develop-upstream/rocm_docs/tensorflow-rocm-release.md](https://github.com/ROCmSoftwarePlatform/tensorflow-upstream/blob/develop-upstream/rocm_docs/tensorflow-rocm-release.md) - ``` - -### Test the TensorFlow installation - -To test the installation of TensorFlow, run the container image as specified in -the previous section Installing TensorFlow. Ensure you have access to the Python -shell in the Docker container. - -```bash -python3 -c 'import tensorflow' 2> /dev/null && echo ‘Success’ || echo ‘Failure’ -``` - -### Run a basic TensorFlow example - -The TensorFlow examples repository provides basic examples that exercise the -framework's functionality. The MNIST database is a collection of handwritten -digits that may be used to train a Convolutional Neural Network for handwriting -recognition. - -Follow these steps: - -1. Clone the TensorFlow example repository. - - ```bash - cd ~ - git clone https://github.com/tensorflow/models.git - ``` - -2. Install the dependencies of the code, and run the code. - - ```bash - #pip3 install requirement.txt - #python mnist_tf.py - ``` diff --git a/docs/how-to/3rd-party/tensorflow-install.rst b/docs/how-to/3rd-party/tensorflow-install.rst new file mode 100644 index 00000000..ca260b56 --- /dev/null +++ b/docs/how-to/3rd-party/tensorflow-install.rst @@ -0,0 +1,209 @@ +Installing TensorFlow for ROCm +############################## + +TensorFlow +********** + +TensorFlow is an open-source library for solving machine-learning, +deep-learning, and artificial-intelligence problems. It can be used to solve +many problems across different sectors and industries but primarily focuses on +training and inference in neural networks. It is one of the most popular and +in-demand frameworks and is very active in open source contribution and +development. + +.. warning:: + + ROCm 5.6 and 5.7 deviates from the standard practice of supporting the last three + TensorFlow versions. This is due to incompatibilities between earlier TensorFlow + versions and changes introduced in the ROCm 5.6 compiler. Refer to the following + version support matrix: + +.. list-table:: + :header-rows: 1 + + * - ROCm + - TensorFlow + * - 5.6.x + - 2.12 + * - 5.7.0 + - 2.12, 2.13 + * - Post 5.7.0 + - Last three versions at ROCm release. + +Installing TensorFlow +===================== + +The following sections contain options for installing TensorFlow. + +Option 1: using a Docker image +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +To install ROCm on bare metal, follow +:doc:`/tutorial/install-overview`. The recommended option to +get a TensorFlow environment is through Docker. + +Using Docker provides portability and access to a prebuilt Docker container that +has been rigorously tested within AMD. This might also save compilation time and +should perform as tested without facing potential installation issues. +Follow these steps: + +1. Pull the latest public TensorFlow Docker image. + + .. code-block:: shell + + docker pull rocm/tensorflow:latest + +2. Once you have pulled the image, run it by using the command below: + + .. code-block:: shell + + docker run -it --network=host --device=/dev/kfd --device=/dev/dri \ + --ipc=host --shm-size 16G --group-add video --cap-add=SYS_PTRACE \ + --security-opt seccomp=unconfined rocm/tensorflow:latest + +Option 2: using a wheels package +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +To install TensorFlow using the wheels package, follow these steps: + +1. Check the Python version. + + .. code-block:: shell + + python3 --version + + .. list-table:: + :header-rows: 1 + + * - If + - Then + * - The Python version is less than 3.7 + - Upgrade Python. + * - The Python version is more than 3.7 + - Skip this step and go to Step 3. + + .. note:: + + The supported Python versions are: + + * 3.7 + * 3.8 + * 3.9 + * 3.10 + + .. code-block:: shell + + sudo apt-get install python3.7 # or python3.8 or python 3.9 or python 3.10 + +2. Set up multiple Python versions using update-alternatives. + + .. code-block:: shell + + update-alternatives --query python3 + sudo update-alternatives --install + /usr/bin/python3 python3 /usr/bin/python[version] [priority] + + .. note:: + + Follow the instruction in Step 2 for incompatible Python versions. + + .. code-block:: shell + + sudo update-alternatives --config python3 + +3. Follow the screen prompts, and select the Python version installed in Step 2. + +4. Install or upgrade PIP. + + .. code-block:: shell + + sudo apt install python3-pip + + To install PIP, use the following: + + .. code-block:: shell + + /usr/bin/python[version] -m pip install --upgrade pip + + Upgrade PIP for Python version installed in step 2: + + .. code-block:: shell + + sudo pip3 install --upgrade pip + +5. Install TensorFlow for the Python version as indicated in Step 2. + + .. code-block:: shell + + /usr/bin/python[version] -m pip install --user tensorflow-rocm==[wheel-version] --upgrade + + For a valid wheel version for a ROCm release, refer to the instruction below: + + .. code-block:: shell + + sudo apt install rocm-libs rccl + +6. Update ``protobuf`` to 3.19 or lower. + + .. code-block:: shell + + /usr/bin/python3.7 -m pip install protobuf=3.19.0 + sudo pip3 install tensorflow + +7. Set the environment variable ``PYTHONPATH``. + + .. code-block:: shell + + export PYTHONPATH="./.local/lib/python[version]/site-packages:$PYTHONPATH" #Use same python version as in step 2 + +8. Install libraries. + + .. code-block:: shell + + sudo apt install rocm-libs rccl + +9. Test installation. + + .. code-block:: shell + + python3 -c 'import tensorflow' 2> /dev/null && echo 'Success' || echo 'Failure' + + .. note:: + + For details on `tensorflow-rocm` wheels and ROCm version compatibility, see: + `https://github.com/ROCmSoftwarePlatform/tensorflow-upstream/blob/develop-upstream/rocm_docs/tensorflow-rocm-release.md `_ + +Test the TensorFlow installation +================================ + +To test the installation of TensorFlow, run the container image as specified in +the previous section Installing TensorFlow. Ensure you have access to the Python +shell in the Docker container. + +.. code-block:: shell + + python3 -c 'import tensorflow' 2> /dev/null && echo ‘Success’ || echo ‘Failure’ + +Run a basic TensorFlow example +============================== + +The TensorFlow examples repository provides basic examples that exercise the +framework's functionality. The MNIST database is a collection of handwritten +digits that may be used to train a Convolutional Neural Network for handwriting +recognition. + +Follow these steps: + +1. Clone the TensorFlow example repository. + + .. code-block:: shell + + cd ~ + git clone https://github.com/tensorflow/models.git + +2. Install the dependencies of the code, and run the code. + + .. code-block:: shell + + pip3 install -r requirement.txt + python3 mnist_tf.py diff --git a/docs/how-to/amdgpu-install.rst b/docs/how-to/amdgpu-install.rst index e84529ce..5a362194 100644 --- a/docs/how-to/amdgpu-install.rst +++ b/docs/how-to/amdgpu-install.rst @@ -1,5 +1,5 @@ -``amdgpu-install`` Script -######################### +Installation via AMDGPU Script +############################## ``amdgpu-install`` is a tool that helps you install and update AMDGPU and ROCm and its components. @@ -17,7 +17,7 @@ Installation Installation of ``amdgpu-install`` differs slightly depending on the OS and its package manager. -Make sure that the :doc:`how-to/prerequisites` are met before installing. +Make sure that the :doc:`/how-to/prerequisites` are met before installing. Ubuntu ====== @@ -200,7 +200,7 @@ You must add the ROCm repositories manually for all ROCm releases you want to install except the latest one. The amdgpu-install script automatically adds the required repositories for the latest release. -See the section "Register ROCm Packages" in :doc:`how-to/native-install/index` +See the section "Register ROCm Packages" in :doc:`/how-to/native-install/index` for: - :ref:`Ubuntu ` diff --git a/docs/how-to/deep-learning-rocm.md b/docs/how-to/deep-learning-rocm.md deleted file mode 100644 index 43d0b7ea..00000000 --- a/docs/how-to/deep-learning-rocm.md +++ /dev/null @@ -1,15 +0,0 @@ -# Deep learning guide - -The following sections cover the different framework installations for ROCm and -deep-learning applications. The following image provides -the sequential flow for the use of each framework. Refer to the ROCm Compatible -Frameworks Release Notes for each framework's most current release notes at -[Third party support](../about/compatibility/3rd-party-support-matrix.md). - -![ROCm Compatible Frameworks Flowchart](../data/install/magma-install/magma005.png "ROCm Compatible Frameworks") - -## Frameworks installation - -* [Installing PyTorch](../install/pytorch-install.md) -* [Installing TensorFlow](../install/tensorflow-install.md) -* [Installing MAGMA](../install/magma-install.md) diff --git a/docs/how-to/docker.md b/docs/how-to/docker.md deleted file mode 100644 index 8464155c..00000000 --- a/docs/how-to/docker.md +++ /dev/null @@ -1,90 +0,0 @@ -# Deploy ROCm Docker containers - -## Prerequisites - -Docker containers share the kernel with the host operating system, therefore the -ROCm kernel-mode driver must be installed on the host. Please refer to -{ref}`linux-install-methods` on installing `amdgpu-dkms`. The other -user-space parts (like the HIP-runtime or math libraries) of the ROCm stack will -be loaded from the container image and don't need to be installed to the host. - -(docker-access-gpus-in-container)= - -## Accessing GPUs in containers - -In order to access GPUs in a container (to run applications using HIP, OpenCL or -OpenMP offloading) explicit access to the GPUs must be granted. - -The ROCm runtimes make use of multiple device files: - -* `/dev/kfd`: the main compute interface shared by all GPUs -* `/dev/dri/renderD`: direct rendering interface (DRI) devices for each - GPU. **``** is a number for each card in the system starting from 128. - -Exposing these devices to a container is done by using the -[`--device`](https://docs.docker.com/engine/reference/commandline/run/#device) -option, i.e. to allow access to all GPUs expose `/dev/kfd` and all -`/dev/dri/renderD` devices: - -```shell -docker run --device /dev/kfd --device /dev/renderD128 --device /dev/renderD129 ... -``` - -More conveniently, instead of listing all devices, the entire `/dev/dri` folder -can be exposed to the new container: - -```shell -docker run --device /dev/kfd --device /dev/dri -``` - -Note that this gives more access than strictly required, as it also exposes the -other device files found in that folder to the container. - -(docker-restrict-gpus)= - -### Restricting a container to a subset of the GPUs - -If a `/dev/dri/renderD` device is not exposed to a container then it cannot use -the GPU associated with it; this allows to restrict a container to any subset of -devices. - -For example to allow the container to access the first and third GPU start it -like: - -```shell -docker run --device /dev/kfd --device /dev/dri/renderD128 --device /dev/dri/renderD130 -``` - -### Additional options - -The performance of an application can vary depending on the assignment of GPUs -and CPUs to the task. Typically, `numactl` is installed as part of many HPC -applications to provide GPU/CPU mappings. This Docker runtime option supports -memory mapping and can improve performance. - -```shell ---security-opt seccomp=unconfined -``` - -This option is recommended for Docker Containers running HPC applications. - -```shell -docker run --device /dev/kfd --device /dev/dri --security-opt seccomp=unconfined ... -``` - -## Docker images in the ROCm ecosystem - -### Base images - - hosts images useful for users -wishing to build their own containers leveraging ROCm. The built images are -available from [Docker Hub](https://hub.docker.com/u/rocm). In particular -`rocm/rocm-terminal` is a small image with the prerequisites to build HIP -applications, but does not include any libraries. - -### Applications - -AMD provides pre-built images for various GPU-ready applications through its -Infinity Hub at . -Examples for invoking each application and suggested parameters used for -benchmarking are also provided there. diff --git a/docs/how-to/docker.rst b/docs/how-to/docker.rst new file mode 100644 index 00000000..b7643deb --- /dev/null +++ b/docs/how-to/docker.rst @@ -0,0 +1,98 @@ +Deploy ROCm Docker Containers +############################# + +Prerequisites +************* + +Docker containers share the kernel with the host operating system, therefore the +ROCm kernel-mode driver must be installed on the host. Please refer to +:doc:`/tutorial/install-overview` on installing ``amdgpu-dkms``. The other +user-space parts (like the HIP-runtime or math libraries) of the ROCm stack will +be loaded from the container image and don't need to be installed to the host. + +(docker-access-gpus-in-container)= + +Accessing GPUs in containers +**************************** + +In order to access GPUs in a container (to run applications using HIP, OpenCL or +OpenMP offloading) explicit access to the GPUs must be granted. + +The ROCm runtimes make use of multiple device files: + +- ``/dev/kfd``: the main compute interface shared by all GPUs +- ``/dev/dri/renderD``: direct rendering interface (DRI) devices for each + GPU. Wher ```` is a number for each card in the system starting from 128. + +Exposing these devices to a container is done by using the +[``--device``](https://docs.docker.com/engine/reference/commandline/run/#device) +option, i.e. to allow access to all GPUs expose ``/dev/kfd`` and all +``/dev/dri/renderD`` devices: + +.. code-block:: shell + + docker run --device /dev/kfd --device /dev/renderD128 --device /dev/renderD129 ... + +More conveniently, instead of listing all devices, the entire ``/dev/dri`` folder +can be exposed to the new container: + +.. code-block:: shell + + docker run --device /dev/kfd --device /dev/dri + +Note that this gives more access than strictly required, as it also exposes the +other device files found in that folder to the container. + +(docker-restrict-gpus)= + +Restricting a container to a subset of the GPUs +=============================================== + +If a ``/dev/dri/renderD`` device is not exposed to a container then it cannot use +the GPU associated with it; this allows to restrict a container to any subset of +devices. + +For example to allow the container to access the first and third GPU start it +like: + +.. code-block:: shell + + docker run --device /dev/kfd --device /dev/dri/renderD128 --device /dev/dri/renderD130 + +Additional options +================== + +The performance of an application can vary depending on the assignment of GPUs +and CPUs to the task. Typically, ``numactl`` is installed as part of many HPC +applications to provide GPU/CPU mappings. This Docker runtime option supports +memory mapping and can improve performance. + +.. code-block:: shell + + --security-opt seccomp=unconfined + +This option is recommended for Docker Containers running HPC applications. + +.. code-block:: shell + + docker run --device /dev/kfd --device /dev/dri --security-opt seccomp=unconfined ... + +Docker images in the ROCm ecosystem +*********************************** + +Base images +=========== + +The `ROCm Docker repository `_ hosts images useful for users +wishing to build their own containers leveraging ROCm. The built images are +available from `Docker Hub `_. In particular +``rocm/rocm-terminal`` is a small image with the prerequisites to build HIP +applications, but does not include any libraries. + +Applications +============ + +AMD provides pre-built images for various GPU-ready applications through +`Infinity Hub `_. +Examples for invoking each application and suggested parameters used for +benchmarking are also provided there. diff --git a/docs/how-to/native-install/index.rst b/docs/how-to/native-install/index.rst index 58cab41d..cc537dd5 100644 --- a/docs/how-to/native-install/index.rst +++ b/docs/how-to/native-install/index.rst @@ -30,4 +30,4 @@ Installation via Native Package Manager See also ******** -- :ref:`/reference/system-requirements` +- :doc:`/reference/system-requirements` diff --git a/docs/how-to/native-install/rhel.rst b/docs/how-to/native-install/rhel.rst index aeca9a15..3654ba1c 100644 --- a/docs/how-to/native-install/rhel.rst +++ b/docs/how-to/native-install/rhel.rst @@ -89,7 +89,7 @@ Upgrading To upgrade an existing ROCm installation to a newer version, follow the steps in :ref:`rhel-register-repo` and :ref:`rhel-install`. -.. _ubuntu-uninstall: +.. _rhel-uninstall: Uninstalling ************ diff --git a/docs/how-to/prerequisites.md b/docs/how-to/prerequisites.md deleted file mode 100644 index 52aea8a8..00000000 --- a/docs/how-to/prerequisites.md +++ /dev/null @@ -1,210 +0,0 @@ -# Installation Prerequisites (Linux) - -You must perform the following steps before installing ROCm and check if the -system meets all the requirements to proceed with the installation. - -## Confirm the System Has a Supported Linux Distribution Version - -The ROCm installation is supported only on specific Linux distributions and -kernel versions. - -### Check the Linux Distribution and Kernel Version on Your System - -This section discusses obtaining information about the Linux distribution and -kernel version. - -#### Linux Distribution Information - -Verify the Linux distribution using the following steps: - -1. To obtain the Linux distribution information, type the following command on - your system from the Command Line Interface (CLI): - - ```shell - uname -m && cat /etc/*release - ``` - -2. Confirm that the obtained Linux distribution information matches with those listed in {ref}`supported_distributions`. - - **Example:** Running the command above on an Ubuntu system results in the - following output: - - ```shell - x86_64 - DISTRIB_ID=Ubuntu - DISTRIB_RELEASE=20.04 - DISTRIB_CODENAME=focal - DISTRIB_DESCRIPTION="Ubuntu 20.04.5 LTS" - ``` - -(check-kernel-info)= - -#### Kernel Information - -Verify the kernel version using the following steps: - -1. To check the kernel version of your Linux system, type the following command: - - ```shell - uname -srmv - ``` - - **Example:** The output of the command above lists the kernel version in the - following format: - - ```output - Linux 5.15.0-46-generic #44~20.04.5-Ubuntu SMP Fri Jun 24 13:27:29 UTC 2022 x86_64 - ``` - -2. Confirm that the obtained kernel version information matches with system - requirements as listed in {ref}`supported_distributions`. - -## Additional package repositories - -On some distributions the ROCm packages depend on packages outside the default -package repositories. These extra repositories need to be enabled before -installation. Follow the instructions below based on your distributions. - -::::::{tab-set} - -:::::{tab-item} Ubuntu -:sync: ubuntu - -All packages are available in the default Ubuntu repositories, therefore -no additional repositories need to be added. - -::::: -:::::{tab-item} Red Hat Enterprise Linux -:sync: RHEL - -::::{rubric} 1. Add the EPEL repository -:::: - -::::{tab-set} -:::{tab-item} RHEL 8 -:sync: RHEL-8 - -```shell -wget https://dl.fedoraproject.org/pub/epel/epel-release-latest-8.noarch.rpm -sudo rpm -ivh epel-release-latest-8.noarch.rpm -``` - -::: -:::{tab-item} RHEL 9 -:sync: RHEL-9 - -```shell -wget https://dl.fedoraproject.org/pub/epel/epel-release-latest-9.noarch.rpm -sudo rpm -ivh epel-release-latest-9.noarch.rpm -``` - -::: -:::: - -::::{rubric} 2. Enable the CodeReady Linux Builder repository -:::: - -Run the following command and follow the instructions. - -```shell -sudo crb enable -``` - -::::: -:::::{tab-item} SUSE Linux Enterprise Server -:sync: SLES - -Add the perl languages repository. - -::::{tab-set} -:::{tab-item} SLES 15.4 -:sync: SLES-15.4 - -```shell -zypper addrepo https://download.opensuse.org/repositories/devel:languages:perl/SLE_15_SP4/devel:languages:perl.repo -``` - -::: -:::{tab-item} SLES 15.5 -:sync: SLES-15.5 - -```shell -zypper addrepo https://download.opensuse.org/repositories/devel:/languages:/perl/15.5/devel:languages:perl.repo -``` - -::: -:::: -::::: -:::::: - -## Kernel headers and development packages - -The driver package uses -[{abbr}`DKMS (Dynamic Kernel Module Support)`][DKMS-wiki] to build -the `amdgpu-dkms` module (driver) for the installed kernels. This requires the -Linux kernel headers and modules to be installed for each. Usually these are -automatically installed with the kernel, but if you have multiple kernel -versions or you have downloaded the kernel images and not the kernel -meta-packages then they must be manually installed. - -[DKMS-wiki]: https://en.wikipedia.org/wiki/Dynamic_Kernel_Module_Support - -To install for the currently active kernel run the command corresponding -to your distribution. - -::::{tab-set} -:::{tab-item} Ubuntu -:sync: ubuntu - -```shell -sudo apt install "linux-headers-$(uname -r)" "linux-modules-extra-$(uname -r)" -``` - -::: - -:::{tab-item} Red Hat Enterprise Linux -:sync: RHEL - -```shell -sudo yum install kernel-headers kernel-devel -``` - -::: - -:::{tab-item} SUSE Linux Enterprise Server -:sync: SLES - -```shell -sudo zypper install kernel-default-devel -``` - -::: -:::: - -## Setting Permissions for Groups - -This section provides steps to add any current user to a video group to access -GPU resources. -Use of the video group is recommended for all ROCm-supported operating -systems. - -1. To check the groups in your system, issue the following command: - - ```shell - groups - ``` - -2. Add yourself to the `render` and `video` group using the command: - - ```shell - sudo usermod -a -G render,video $LOGNAME - ``` - -To add all future users to the `video` and `render` groups by default, run -the following commands: - -```shell -echo 'ADD_EXTRA_GROUPS=1' | sudo tee -a /etc/adduser.conf -echo 'EXTRA_GROUPS=video' | sudo tee -a /etc/adduser.conf -echo 'EXTRA_GROUPS=render' | sudo tee -a /etc/adduser.conf -``` diff --git a/docs/how-to/prerequisites.rst b/docs/how-to/prerequisites.rst new file mode 100644 index 00000000..9e78e840 --- /dev/null +++ b/docs/how-to/prerequisites.rst @@ -0,0 +1,189 @@ +Installation Prerequisites +########################## + +You must perform the following steps before installing ROCm and check if the +system meets all the requirements to proceed with the installation. + +Confirm the System Has a Supported Linux Distribution Version +************************************************************** + +The ROCm installation is supported only on specific Linux distributions and +kernel versions. + +Check the Linux Distribution and Kernel Version on Your System +============================================================== + +This section discusses obtaining information about the Linux distribution and +kernel version. + +Linux Distribution Information +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Verify the Linux distribution using the following steps: + +1. To obtain the Linux distribution information, type the following command on + your system from the Command Line Interface (CLI): + + .. code-block:: shell + + uname -m && cat /etc/*release + +2. Confirm that the obtained Linux distribution information matches with those listed in :ref:`supported_distributions`. + + **Example:** Running the command above on an Ubuntu system results in the + following output: + + .. code-block:: shell + + x86_64 + DISTRIB_ID=Ubuntu + DISTRIB_RELEASE=20.04 + DISTRIB_CODENAME=focal + DISTRIB_DESCRIPTION="Ubuntu 20.04.5 LTS" + +.. _check-kernel-info: + +Kernel Information +^^^^^^^^^^^^^^^^^^ + +Verify the kernel version using the following steps: + +1. To check the kernel version of your Linux system, type the following command: + + .. code-block:: shell + + uname -srmv + + **Example:** The output of the command above lists the kernel version in the + following format: + + .. code-block:: shell + + Linux 5.15.0-46-generic #44~20.04.5-Ubuntu SMP Fri Jun 24 13:27:29 UTC 2022 x86_64 + +2. Confirm that the obtained kernel version information matches with system + requirements as listed in :ref:`supported_distributions`. + +Additional package repositories +******************************* + +On some distributions the ROCm packages depend on packages outside the default +package repositories. These extra repositories need to be enabled before +installation. Follow the instructions below based on your distributions. + +.. tab-set:: + + .. tab-item:: Ubuntu + + All packages are available in the default Ubuntu repositories, therefore no additional repositories need to be added. + + .. tab-item:: Red Hat Enterprise Linux + + 1. Add the EPEL repository: + + .. tab-set:: + + + .. tab-item:: RHEL 8 + + .. code-block:: shell + + wget https://dl.fedoraproject.org/pub/epel/epel-release-latest-8.noarch.rpm + sudo rpm -ivh epel-release-latest-8.noarch.rpm + + .. tab-item:: RHEL 9 + + .. code-block:: shell + + wget https://dl.fedoraproject.org/pub/epel/epel-release-latest-9.noarch.rpm + sudo rpm -ivh epel-release-latest-8.noarch.rpm + + 2. Enable the CodeReady Linux Builder repository: + + .. code-block:: shell + + sudo crb enable + + .. tab-item:: SUSE Linux Enterprise Server + + Add the Perl language repository. + + .. tab-set:: + + .. tab-item:: SLES 15.4 + + .. code-block:: shell + + zypper addrepo https://download.opensuse.org/repositories/devel:/languages:/perl/15.4/devel:languages:perl.repo + + .. tab-item:: SLES 15.5 + + .. code-block:: shell + + zypper addrepo https://download.opensuse.org/repositories/devel:/languages:/perl/15.5/devel:languages:perl.repo + +Kernel Headers and Development Packages +*************************************** + +The driver package uses +:abbr:`DKMS (Dynamic Kernel Module Support)` [DKMS-wiki]_ to build +the `amdgpu-dkms` module (driver) for the installed kernels. This requires the +Linux kernel headers and modules to be installed for each. Usually these are +automatically installed with the kernel, but if you have multiple kernel +versions or you have downloaded the kernel images and not the kernel +meta-packages then they must be manually installed. + +.. [DKMS-wiki] https://en.wikipedia.org/wiki/Dynamic_Kernel_Module_Support + +To install for the currently active kernel run the command corresponding +to your distribution. + +.. tab-set:: + + .. tab-item:: Ubuntu + + .. code-block:: shell + + sudo apt install "linux-headers-$(uname -r)" "linux-modules-extra-$(uname -r)" + + .. tab-item:: Red Hat Enterprise Linux + + .. code-block:: shell + + sudo yum install kernel-headers kernel-devel + + + .. tab-item:: SUSE Linux Enterprise Server + + .. code-block:: shell + + sudo zypper install kernel-default-devel + +Setting Permissions for Groups +****************************** + +This section provides steps to add any current user to a video group to access +GPU resources. +Use of the video group is recommended for all ROCm-supported operating +systems. + +1. To check the groups in your system, issue the following command: + + .. code-block:: shell + + groups + +2. Add yourself to the ``render`` and ``video`` group using the command: + + .. code-block:: shell + + sudo usermod -a -G render,video $LOGNAME + +To add all future users to the ``video`` and ``render`` groups by default, run +the following commands: + +.. code-block:: shell + + echo 'ADD_EXTRA_GROUPS=1' | sudo tee -a /etc/adduser.conf + echo 'EXTRA_GROUPS=video' | sudo tee -a /etc/adduser.conf + echo 'EXTRA_GROUPS=render' | sudo tee -a /etc/adduser.conf diff --git a/docs/how-to/spack-intro.md b/docs/how-to/spack-intro.md deleted file mode 100644 index 747c55f1..00000000 --- a/docs/how-to/spack-intro.md +++ /dev/null @@ -1,421 +0,0 @@ -# Introduction to Spack - -Spack is a package management tool designed to support multiple software versions and -configurations on a wide variety of platforms and environments. It was designed for large -supercomputing centers, where many users share common software installations on clusters with -exotic architectures using libraries that do not have a standard ABI. Spack is non-destructive: installing -a new version does not break existing installations, so many configurations can coexist on the same -system. - -Most importantly, Spack is *simple*. It offers a simple *spec* syntax, so users can concisely specify -versions and configuration options. Spack is also simple for package authors: package files are written -in pure Python, and specs allow package authors to maintain a single file for many different builds of -the same package. For more information on Spack, see -[https://spack-tutorial.readthedocs.io/en/latest/](https://spack-tutorial.readthedocs.io/en/latest/). - -## ROCM packages in Spack - -| **Component** | **Spack Package Name** | -|---------------------------|------------------------| -| **rocm-cmake** | rocm-cmake | -| **thunk** | hsakmt-roct | -| **rocm-smi-lib** | rocm-smi-lib | -| **hsa** | hsa-rocr-dev | -| **lightning** | llvm-amdgpu | -| **devicelibs** | rocm-device-libs | -| **comgr** | comgr | -| **rocclr (vdi)** | hip-rocclr | -| **hipify_clang** | hipify-clang | -| **hip (hip_in_vdi)** | hip | -| **ocl (opencl_on_vdi )** | rocm-opencl | -| **rocminfo** | rocminfo | -| **clang-ocl** | rocm-clang-ocl | -| **rccl** | rccl | -| **atmi** | atmi | -| **rocm_debug_agent** | rocm-debug-agent | -| **rocm_bandwidth_test** | rocm-bandwidth-test | -| **rocprofiler** | rocprofiler-dev | -| **roctracer-dev-api** | roctracer-dev-api | -| **roctracer** | roctracer-dev | -| **dbgapi** | rocm-dbgapi | -| **rocm-gdb** | rocm-gdb | -| **openmp-extras** | rocm-openmp-extras | -| **rocBLAS** | rocblas | -| **hipBLAS** | hipblas | -| **rocFFT** | rocfft | -| **rocRAND** | rocrand | -| **rocSPARSE** | rocsparse | -| **hipSPARSE** | hipsparse | -| **rocALUTION** | rocalution | -| **rocSOLVER** | rocsolver | -| **rocPRIM** | rocprim | -| **rocThrust** | rocthrust | -| **hipCUB** | hipcub | -| **hipfort** | hipfort | -| **ROCmValidationSuite** | rocm-validation-suite | -| **MIOpenGEMM** | miopengemm | -| **MIOpen(Hip variant)** | miopen-hip | -| **MIOpen(opencl)** | miopen-opencl | -| **MIVisionX** | mivisionx | -| **AMDMIGraphX** | migraphx | -| **rocm-tensile** | rocm-tensile | -| **hipfft** | hipfft | -| **RDC** | rdc | -| **hipsolver** | hipsolver | -| **mlirmiopen** | mlirmiopen | - -```{note} -You must install all prerequisites before installing Spack. -``` - -::::{tab-set} -:::{tab-item} Ubuntu -:sync: Ubuntu - -```shell -# Install some essential utilities: -apt-get update -apt-get install make patch bash tar gzip unzip bzip2 file gnupg2 git gawk -apt-get update -y -apt-get install -y xz-utils -apt-get build-essential -apt-get install vim -# Install Python: -apt-get install python3 -apt-get upgrade python3-pip -# Install Compilers: -apt-get install gcc -apt-get install gfortran -``` - -::: -:::{tab-item} SLES -:sync: SLES - -```shell -# Install some essential utilities: -zypper update -zypper install make patch bash tar gzip unzip bzip xz file gnupg2 git awk -zypper in -t pattern -zypper install vim -# Install Python: -zypper install python3 -zypper install python3-pip -# Install Compilers: -zypper install gcc -zypper install gcc-fortran -zypper install gcc-c++ -``` - -::: -:::{tab-item} CentOS -:sync: CentOS - -```shell -# Install some essential utilities: -yum update -yum install make -yum install patch bash tar yum install gzip unzip bzip2 xz file gnupg2 git gawk -yum group install "Development Tools" -yum install vim -# Install Python: -yum install python3 -pip3 install --upgrade pip -# Install compilers: -yum install gcc -yum install gcc-gfortran -yum install gcc-c++ -``` - -::: -:::: - -## Steps to build ROCm components using Spack - -1. To use the spack package manager, clone the Spack project from GitHub. - - ```bash - git clone - ``` - -2. Initialize Spack. - - The `setup-env.sh` script initializes the Spack environment. - - ```bash - cd spack - - . share/spack/setup-env.sh - ``` - - Spack commands are available once the above steps are completed. To list the available commands, - use `help`. - - ```bash - root@[ixt-rack-104:/spack\#](http://ixt-rack-104/spack) spack help - ``` - -## Using Spack to install ROCm components - -1. `rocm-cmake` - - Install the default variants and the latest version of `rocm-cmake`. - - ```bash - spack install rocm-cmake - ``` - - To install a specific version of `rocm-cmake`, use: - - ```bash - spack install rocm-cmake@ - ``` - - For example, `spack install rocm-cmake@5.2.0` - -2. `info` - - The `info**` command displays basic package information. It shows the preferred, safe, and - deprecated versions, in addition to the available variants. It also shows the dependencies with other - packages. - - ```bash - spack info mivisionx - ``` - - For example: - - ```bash - root@[ixt-rack-104:/spack\#](http://ixt-rack-104/spack) spack info mivisionx - CMakePackage: mivisionx - - Description: - MIVisionX toolkit is a set of comprehensive computer vision and machine - intelligence libraries, utilities, and applications bundled into a - single toolkit. - - Homepage: - - Preferred version: - 5.3.0 - - Safe versions: - 5.3.0 - 5.2.3 - 5.2.1 - 5.2.0 - 5.1.3 - 5.1.0 - 5.0.2 - 5.0.0 - 4.5.2 - 4.5.0 - - Deprecated versions: - 4.3.1 - 4.3.0 - 4.2.0 - 4.1.0 - 4.0.0 - 3.10.0 - 3.9.0 - 3.8.0 - 3.7.0 - 1.7 - - Variants: - Name [Default] When Allowed values Description - ==================== ==== ==================== ================================== - - build_type [Release] -- Release, Debug, CMake build type - RelWithDebInfo - hip [on] -- on, off Use HIP as backend - ipo [off] -- on, off CMake interprocedural optimization - opencl [off] -- on, off Use OPENCL as the backend - - Build Dependencies: - cmake ffmpeg libjpeg-turbo miopen-hip miopen-opencl miopengemm opencv openssl protobuf rocm-cmake rocm-opencl - - Link Dependencies: - miopen-hip miopen-opencl miopengemm openssl rocm-opencl - - Run Dependencies: - None - - root@[ixt-rack-104:/spack\#](http://ixt-rack-104/spack) - ``` - -## Installing variants for ROCm components - -The variants listed above indicate that the `mivisionx` package is built by default with -`build_type=Release` and the `hip` backend, and without the `opencl` backend. `build_type=Debug` and -`RelWithDebInfo`, with `opencl` and without `hip`, are also supported. - -For example: - -```bash -spack install mivisionx build_type=Debug (Backend will be hip since it is the default one) -spack install mivisionx+opencl build_type=Debug (Backend will be opencl and hip will be disabled as per the conflict defined in recipe) -``` - -* `spack spec` command - - To display the dependency tree, the `spack spec` command can be used with the same format. - - For example: - - ```bash - root@[ixt-rack-104:/spack\#](http://ixt-rack-104/spack) spack spec mivisionx - Input spec - -------------------------------- - mivisionx - - Concretized - -------------------------------- - mivisionx@5.3.0%gcc@9.4.0+hip\~ipo\~opencl build_type=Release arch=linux-ubuntu20.04-skylake_avx512 - ``` - -## Creating an environment - -You can create an environment with all the required components of your version. - -1. In the root folder, create a new folder when you can create a `.yaml` file. This file is used to -create an environment. - - ```bash - * mkdir /localscratch - * cd /localscratch - * vi sample.yaml - ``` - -2. Add all the required components in the `sample.yaml` file: - - ```bash - * spack: - * concretization: separately - * packages: - * all: - * compiler: [gcc@8.5.0] - * specs: - * - matrix: - * - ['%gcc@8.5.0\^cmake@3.19.7'] - * - [rocm-cmake@5.3.2, rocm-dbgapi@5.3.2, rocm-debug-agent@5.3.2, rocm-gdb@5.3.2, - * rocminfo@5.3.2, rocm-opencl@5.3.2, rocm-smi-lib@5.3.2, rocm-tensile@5.3.2, rocm-validation-suite@4.3.1, - * rocprim@5.3.2, rocprofiler-dev@5.3.2, rocrand@5.3.2, rocsolver@5.3.2, rocsparse@5.3.2, - * rocthrust@5.3.2, roctracer-dev@5.3.2] - * view: true - ``` - -3. Once you've created the `.yaml` file, you can use it to create an environment. - - ```bash - * spack env create -d /localscratch/MyEnvironment /localscratch/sample.yaml - ``` - -4. Activate the environment. - - ```bash - * spack env activate /localscratch/MyEnvironment - ``` - -5. Verify that you want all the component versions. - - ```bash - * spack find - this command will list out all components been in the environment (and 0 installed ) - ``` - -6. Install all the components in the `.yaml` file. - - ```bash - * cd /localscratch/MyEnvironment - * spack install -j 50 - ``` - -7. Check that all components are successfully installed. - - ```bash - * spack find - ``` - -8. If any modification is made to the `.yaml` file, you must deactivate the existing environment and create a new one in order for the modications to be reflected. - - To deactivate, use: - - ```bash - * spack env deactivate - ``` - -## Create and apply a patch before installation - -Spack installs ROCm packages after pulling the source code from GitHub and building it locally. In -order to build a component with any modification to the source code, you must generate a patch and -apply it before the build phase. - -To generate a patch and build with the changes: - -1. Stage the source code. - - ```bash - spack stage hip@5.2.0 (This will pull the 5.2.0 release version source code of hip and display the path to spack-src directory where entire source code is available) - - root@[ixt-rack-104:/spack#](http://ixt-rack-104/spack) spack stage hip@5.2.0 - ==> Fetching - ==> Fetching - ==> Fetching - ==> Moving resource stage - source: /tmp/root/spack-stage/resource-hipamd-wzo5y6ysvmadyb5mvffr35galb6vjxb7/spack-src/ - destination: /tmp/root/spack-stage/spack-stage-hip-5.2.0-wzo5y6ysvmadyb5mvffr35galb6vjxb7/spack-src/hipamd - ==> Moving resource stage - source: /tmp/root/spack-stage/resource-opencl-wzo5y6ysvmadyb5mvffr35galb6vjxb7/spack-src/ - destination: /tmp/root/spack-stage/spack-stage-hip-5.2.0-wzo5y6ysvmadyb5mvffr35galb6vjxb7/spack-src/opencl - ==> Moving resource stage - source: /tmp/root/spack-stage/resource-rocclr-wzo5y6ysvmadyb5mvffr35galb6vjxb7/spack-src/ - destination: /tmp/root/spack-stage/spack-stage-hip-5.2.0-wzo5y6ysvmadyb5mvffr35galb6vjxb7/spack-src/rocclr - ==> Staged hip in /tmp/root/spack-stage/spack-stage-hip-5.2.0-wzo5y6ysvmadyb5mvffr35galb6vjxb7 - ``` - -2. Change directory to `spack-src` inside the staged directory. - - ```bash - root@[ixt-rack-104:/spack#cd /tmp/root/spack-stage/spack-stage-hip-5.2.0-wzo5y6ysvmadyb5mvffr35galb6vjxb7](http://ixt-rack-104/spack) - root@[ixt-rack-104:/tmp/root/spack-stage/spack-stage-hip-5.2.0-wzo5y6ysvmadyb5mvffr35galb6vjxb7#](http://ixt-rack-104/tmp/root/spack-stage/spack-stage-hip-5.2.0-wzo5y6ysvmadyb5mvffr35galb6vjxb7) cd spack-src/ - ``` - -3. Create a new Git repository. - - ```bash - root@[ixt-rack-104:/tmp/root/spack-stage/spack-stage-hip-5.2.0-wzo5y6ysvmadyb5mvffr35galb6vjxb7/spack-src#](http://ixt-rack-104/tmp/root/spack-stage/spack-stage-hip-5.2.0-wzo5y6ysvmadyb5mvffr35galb6vjxb7/spack-src) git init - ``` - -4. Add the entire directory to the repository. - - ```bash - root@[ixt-rack-104:/tmp/root/spack-stage/spack-stage-hip-5.2.0-wzo5y6ysvmadyb5mvffr35galb6vjxb7/spack-src#](http://ixt-rack-104/tmp/root/spack-stage/spack-stage-hip-5.2.0-wzo5y6ysvmadyb5mvffr35galb6vjxb7/spack-src) git add . - ``` - -5. Make the required changes to the source code. - - ```bash - root@[ixt-rack-104:/tmp/root/spack-stage/spack-stage-hip-5.2.0-wzo5y6ysvmadyb5mvffr35galb6vjxb7/spack-src#](http://ixt-rack-104/tmp/root/spack-stage/spack-stage-hip-5.2.0-wzo5y6ysvmadyb5mvffr35galb6vjxb7/spack-src) vi hipamd/CMakeLists.txt (Make required changes in the source code) - ``` - -6. Generate the patch using the `git diff` command. - - ```bash - diff > /spack/var/spack/repos/builtin/packages/hip/0001-modifications.patch - ``` - -7. Update the recipe with the patch file name and any conditions you want to apply. - - ```bash - root@[ixt-rack-104:/tmp/root/spack-stage/spack-stage-hip-5.2.0-wzo5y6ysvmadyb5mvffr35galb6vjxb7/spack-src#](http://ixt-rack-104/tmp/root/spack-stage/spack-stage-hip-5.2.0-wzo5y6ysvmadyb5mvffr35galb6vjxb7/spack-src) spack edit hip - ``` - - Provide the patch file name and the conditions for the patch: - - `patch("0001-modifications.patch", when="@5.2.0")` - - Spack applies `0001-modifications.patch` on the `5.2.0` release code before starting the `hip` build. - - After each modification, you must update the recipe. If there is no change to the recipe, run - `touch /spack/var/spack/repos/builtin/packages/hip/package.py` diff --git a/docs/how-to/spack.rst b/docs/how-to/spack.rst new file mode 100644 index 00000000..c523cf47 --- /dev/null +++ b/docs/how-to/spack.rst @@ -0,0 +1,476 @@ +Introduction to Spack +##################### + +Spack is a package management tool designed to support multiple software versions and +configurations on a wide variety of platforms and environments. It was designed for large +supercomputing centers, where many users share common software installations on clusters with +exotic architectures using libraries that do not have a standard ABI. Spack is non-destructive: installing +a new version does not break existing installations, so many configurations can coexist on the same +system. + +Most importantly, Spack is *simple*. It offers a simple *spec* syntax, so users can concisely specify +versions and configuration options. Spack is also simple for package authors: package files are written +in pure Python, and specs allow package authors to maintain a single file for many different builds of +the same package. For more information on Spack, see the +`Spack Tutorial `_. + +ROCM packages in Spack +********************** + +.. list-table:: + :header-rows: 1 + + * - Component + - ``Spack Package Name`` + + * - ``rocm-cmake`` + - ``rocm-cmake`` + * - ``thunk`` + - ``hsakmt-roct`` + * - ``rocm-smi-lib`` + - ``rocm-smi-lib`` + * - ``hsa`` + - ``hsa-rocr-dev`` + * - ``lightning`` + - ``llvm-amdgpu`` + * - ``devicelibs`` + - ``rocm-device-libs`` + * - ``comgr`` + - ``comgr`` + * - ``rocclr (vdi)`` + - ``hip-rocclr`` + * - ``hipify_clang`` + - ``hipify-clang`` + * - ``hip (hip_in_vdi)`` + - ``hip`` + * - ``ocl (opencl_on_vdi )`` + - ``rocm-opencl`` + * - ``rocminfo`` + - ``rocminfo`` + * - ``clang-ocl`` + - ``rocm-clang-ocl`` + * - ``rccl`` + - ``rccl`` + * - ``atmi`` + - ``atmi`` + * - ``rocm_debug_agent`` + - ``rocm-debug-agent`` + * - ``rocm_bandwidth_test`` + - ``rocm-bandwidth-test`` + * - ``rocprofiler`` + - ``rocprofiler-dev`` + * - ``roctracer-dev-api`` + - ``roctracer-dev-api`` + * - ``roctracer`` + - ``roctracer-dev`` + * - ``dbgapi`` + - ``rocm-dbgapi`` + * - ``rocm-gdb`` + - ``rocm-gdb`` + * - ``openmp-extras`` + - ``rocm-openmp-extras`` + * - ``rocBLAS`` + - ``rocblas`` + * - ``hipBLAS`` + - ``hipblas`` + * - ``rocFFT`` + - ``rocfft`` + * - ``rocRAND`` + - ``rocrand`` + * - ``rocSPARSE`` + - ``rocsparse`` + * - ``hipSPARSE`` + - ``hipsparse`` + * - ``rocALUTION`` + - ``rocalution`` + * - ``rocSOLVER`` + - ``rocsolver`` + * - ``rocPRIM`` + - ``rocprim`` + * - ``rocThrust`` + - ``rocthrust`` + * - ``hipCUB`` + - ``hipcub`` + * - ``hipfort`` + - ``hipfort`` + * - ``ROCmValidationSuite`` + - ``rocm-validation-suite`` + * - ``MIOpenGEMM`` + - ``miopengemm`` + * - ``MIOpen(Hip variant)`` + - ``miopen-hip`` + * - ``MIOpen(opencl)`` + - ``miopen-opencl`` + * - ``MIVisionX`` + - ``mivisionx`` + * - ``AMDMIGraphX`` + - ``migraphx`` + * - ``rocm-tensile`` + - ``rocm-tensile`` + * - ``hipfft`` + - ``hipfft`` + * - ``RDC`` + - ``rdc`` + * - ``hipsolver`` + - ``hipsolver`` + * - ``mlirmiopen`` + - ``mlirmiopen`` + +.. note:: + You must install all prerequisites before installing Spack. + + +.. tab-set:: + .. tab-item:: Ubuntu + :sync: Ubuntu + + .. code-block:: shell + + # Install some essential utilities: + apt-get update + apt-get install make patch bash tar gzip unzip bzip2 file gnupg2 git gawk + apt-get update -y + apt-get install -y xz-utils + apt-get build-essential + apt-get install vim + # Install Python: + apt-get install python3 + apt-get upgrade python3-pip + # Install Compilers: + apt-get install gcc + apt-get install gfortran + + .. tab-item:: SLES + :sync: SLES + + .. code-block:: shell + + # Install some essential utilities: + zypper update + zypper install make patch bash tar gzip unzip bzip xz file gnupg2 git awk + zypper in -t pattern + zypper install vim + # Install Python: + zypper install python3 + zypper install python3-pip + # Install Compilers: + zypper install gcc + zypper install gcc-fortran + zypper install gcc-c++ + + .. tab-item:: CentOS + :sync: CentOS + + .. code-block:: shell + + # Install some essential utilities: + yum update + yum install make + yum install patch bash tar yum install gzip unzip bzip2 xz file gnupg2 git gawk + yum group install "Development Tools" + yum install vim + # Install Python: + yum install python3 + pip3 install --upgrade pip + # Install compilers: + yum install gcc + yum install gcc-gfortran + yum install gcc-c++ + +Steps to build ROCm components using Spack +****************************************** + +1. To use the spack package manager, clone the Spack project from GitHub. + + .. code-block:: shell + + git clone + +2. Initialize Spack. + + The ``setup-env.sh`` script initializes the Spack environment. + + .. code-block:: shell + + cd spack + + . share/spack/setup-env.sh + + Spack commands are available once the above steps are completed. To list the available commands, + use ``help``. + + .. code-block:: shell + + root@[ixt-rack-104:/spack\#](http://ixt-rack-104/spack) spack help + +Using Spack to install ROCm components +************************************** + +1. ``rocm-cmake`` + + Install the default variants and the latest version of ``rocm-cmake``. + + .. code-block:: shell + + spack install rocm-cmake + + To install a specific version of ``rocm-cmake``, use: + + .. code-block:: shell + + spack install rocm-cmake@ + + For example, ``spack install rocm-cmake@5.2.0`` + +2. ``info`` + + The ``info`` command displays basic package information. It shows the preferred, safe, and + deprecated versions, in addition to the available variants. It also shows the dependencies with other + packages. + + .. code-block:: shell + + spack info mivisionx + + For example: + + + .. code-block:: shell + + root@[ixt-rack-104:/spack\#](http://ixt-rack-104/spack) spack info mivisionx + CMakePackage: mivisionx + + Description: + MIVisionX toolkit is a set of comprehensive computer vision and machine + intelligence libraries, utilities, and applications bundled into a + single toolkit. + + Homepage: + + Preferred version: + 5.3.0 + + Safe versions: + 5.3.0 + 5.2.3 + 5.2.1 + 5.2.0 + 5.1.3 + 5.1.0 + 5.0.2 + 5.0.0 + 4.5.2 + 4.5.0 + + Deprecated versions: + 4.3.1 + 4.3.0 + 4.2.0 + 4.1.0 + 4.0.0 + 3.10.0 + 3.9.0 + 3.8.0 + 3.7.0 + 1.7 + + Variants: + Name [Default] When Allowed values Description + ==================== ==== ==================== ================================== + + build_type [Release] -- Release, Debug, CMake build type + RelWithDebInfo + hip [on] -- on, off Use HIP as backend + ipo [off] -- on, off CMake interprocedural optimization + opencl [off] -- on, off Use OPENCL as the backend + + Build Dependencies: + cmake ffmpeg libjpeg-turbo miopen-hip miopen-opencl miopengemm opencv openssl protobuf rocm-cmake rocm-opencl + + Link Dependencies: + miopen-hip miopen-opencl miopengemm openssl rocm-opencl + + Run Dependencies: + None + + root@[ixt-rack-104:/spack\#](http://ixt-rack-104/spack) + +Installing variants for ROCm components +*************************************** + +The variants listed above indicate that the ``mivisionx`` package is built by +default with ``build_type=Release`` and the ``hip`` backend, and without the +``opencl`` backend. ``build_type=Debug`` and ``RelWithDebInfo``, with ``opencl`` +and without ``hip``, are also supported. + +For example: + +.. code-block:: shell + + spack install mivisionx build_type=Debug (Backend will be hip since it is the default one) + spack install mivisionx+opencl build_type=Debug (Backend will be opencl and hip will be disabled as per the conflict defined in recipe) + + +* ``spack spec`` command + + To display the dependency tree, the ``spack spec`` command can be used with the same format. + + For example: + + .. code-block:: shell + + root@[ixt-rack-104:/spack\#](http://ixt-rack-104/spack) spack spec mivisionx + Input spec + -------------------------------- + mivisionx + + Concretized + -------------------------------- + mivisionx@5.3.0%gcc@9.4.0+hip\~ipo\~opencl build_type=Release arch=linux-ubuntu20.04-skylake_avx512 + +Creating an environment +*********************** + +You can create an environment with all the required components of your version. + +1. In the root folder, create a new folder when you can create a ``.yaml`` file. This file is used to +create an environment. + + .. code-block:: shell + + mkdir /localscratch + cd /localscratch + vi sample.yaml + +2. Add all the required components in the ``sample.yaml`` file: + + .. code-block:: shell + + spack: + concretization: separately + packages: + all: + compiler: [gcc@8.5.0] + specs: + - matrix: + - ['%gcc@8.5.0\^cmake@3.19.7'] + - [rocm-cmake@5.3.2, rocm-dbgapi@5.3.2, rocm-debug-agent@5.3.2, rocm-gdb@5.3.2, + rocminfo@5.3.2, rocm-opencl@5.3.2, rocm-smi-lib@5.3.2, rocm-tensile@5.3.2, rocm-validation-suite@4.3.1, + rocprim@5.3.2, rocprofiler-dev@5.3.2, rocrand@5.3.2, rocsolver@5.3.2, rocsparse@5.3.2, + rocthrust@5.3.2, roctracer-dev@5.3.2] + view: true + +3. Once you've created the ``.yaml`` file, you can use it to create an environment. + + .. code-block:: shell + + spack env create -d /localscratch/MyEnvironment /localscratch/sample.yaml + +4. Activate the environment. + + .. code-block:: shell + + spack env activate /localscratch/MyEnvironment + +5. Verify that you want all the component versions. + + .. code-block:: shell + + spack find # this command will list out all components been in the environment (and 0 installed ) + +6. Install all the components in the ``.yaml`` file. + + .. code-block:: shell + + cd /localscratch/MyEnvironment + spack install -j 50 + +7. Check that all components are successfully installed. + + .. code-block:: shell + + spack find + +8. If any modification is made to the ``.yaml`` file, you must deactivate the existing environment and create a new one in order for the modications to be reflected. + + To deactivate, use: + + .. code-block:: shell + + spack env deactivate + +Create and apply a patch before installation +******************************************** + +Spack installs ROCm packages after pulling the source code from GitHub and building it locally. In +order to build a component with any modification to the source code, you must generate a patch and +apply it before the build phase. + +To generate a patch and build with the changes: + +1. Stage the source code. + + .. code-block:: shell + + spack stage hip@5.2.0 # (This will pull the 5.2.0 release version source code of hip and display the path to spack-src directory where entire source code is available) + + root@[ixt-rack-104:/spack#](http://ixt-rack-104/spack) spack stage hip@5.2.0 + ==> Fetching + ==> Fetching + ==> Fetching + ==> Moving resource stage + source: /tmp/root/spack-stage/resource-hipamd-wzo5y6ysvmadyb5mvffr35galb6vjxb7/spack-src/ + destination: /tmp/root/spack-stage/spack-stage-hip-5.2.0-wzo5y6ysvmadyb5mvffr35galb6vjxb7/spack-src/hipamd + ==> Moving resource stage + source: /tmp/root/spack-stage/resource-opencl-wzo5y6ysvmadyb5mvffr35galb6vjxb7/spack-src/ + destination: /tmp/root/spack-stage/spack-stage-hip-5.2.0-wzo5y6ysvmadyb5mvffr35galb6vjxb7/spack-src/opencl + ==> Moving resource stage + source: /tmp/root/spack-stage/resource-rocclr-wzo5y6ysvmadyb5mvffr35galb6vjxb7/spack-src/ + destination: /tmp/root/spack-stage/spack-stage-hip-5.2.0-wzo5y6ysvmadyb5mvffr35galb6vjxb7/spack-src/rocclr + ==> Staged hip in /tmp/root/spack-stage/spack-stage-hip-5.2.0-wzo5y6ysvmadyb5mvffr35galb6vjxb7 + +2. Change directory to ``spack-src`` inside the staged directory. + + .. code-block:: shell + + root@[ixt-rack-104:/spack#cd /tmp/root/spack-stage/spack-stage-hip-5.2.0-wzo5y6ysvmadyb5mvffr35galb6vjxb7](http://ixt-rack-104/spack) + root@[ixt-rack-104:/tmp/root/spack-stage/spack-stage-hip-5.2.0-wzo5y6ysvmadyb5mvffr35galb6vjxb7#](http://ixt-rack-104/tmp/root/spack-stage/spack-stage-hip-5.2.0-wzo5y6ysvmadyb5mvffr35galb6vjxb7) cd spack-src/ + +3. Create a new Git repository. + + .. code-block:: shell + + root@[ixt-rack-104:/tmp/root/spack-stage/spack-stage-hip-5.2.0-wzo5y6ysvmadyb5mvffr35galb6vjxb7/spack-src#](http://ixt-rack-104/tmp/root/spack-stage/spack-stage-hip-5.2.0-wzo5y6ysvmadyb5mvffr35galb6vjxb7/spack-src) git init + +4. Add the entire directory to the repository. + + .. code-block:: shell + + root@[ixt-rack-104:/tmp/root/spack-stage/spack-stage-hip-5.2.0-wzo5y6ysvmadyb5mvffr35galb6vjxb7/spack-src#](http://ixt-rack-104/tmp/root/spack-stage/spack-stage-hip-5.2.0-wzo5y6ysvmadyb5mvffr35galb6vjxb7/spack-src) git add . + +5. Make the required changes to the source code. + + .. code-block:: shell + + root@[ixt-rack-104:/tmp/root/spack-stage/spack-stage-hip-5.2.0-wzo5y6ysvmadyb5mvffr35galb6vjxb7/spack-src#](http://ixt-rack-104/tmp/root/spack-stage/spack-stage-hip-5.2.0-wzo5y6ysvmadyb5mvffr35galb6vjxb7/spack-src) vi hipamd/CMakeLists.txt (Make required changes in the source code) + +6. Generate the patch using the ``git diff`` command. + + .. code-block:: shell + + diff > /spack/var/spack/repos/builtin/packages/hip/0001-modifications.patch + +7. Update the recipe with the patch file name and any conditions you want to apply. + + .. code-block:: shell + + root@[ixt-rack-104:/tmp/root/spack-stage/spack-stage-hip-5.2.0-wzo5y6ysvmadyb5mvffr35galb6vjxb7/spack-src#](http://ixt-rack-104/tmp/root/spack-stage/spack-stage-hip-5.2.0-wzo5y6ysvmadyb5mvffr35galb6vjxb7/spack-src) spack edit hip + + Provide the patch file name and the conditions for the patch: + + ``patch("0001-modifications.patch", when="@5.2.0")`` + + Spack applies ``0001-modifications.patch`` on the ``5.2.0`` release code before starting the ``hip`` build. + + After each modification, you must update the recipe. If there is no change to the recipe, run + ``touch /spack/var/spack/repos/builtin/packages/hip/package.py`` diff --git a/docs/reference/3rd-party-support-matrix.rst b/docs/reference/3rd-party-support-matrix.rst index 7263d9ac..51b03499 100644 --- a/docs/reference/3rd-party-support-matrix.rst +++ b/docs/reference/3rd-party-support-matrix.rst @@ -140,4 +140,4 @@ contemporary CUDA / NVIDIA HPC SDK alternatives. - 1.17.2 - 22.9 -For the latest documentation of these libraries, refer to :doc:`API libraries <../../reference/library-index.md>`. +For the latest documentation of these libraries, refer to :doc:`API libraries `. diff --git a/docs/sphinx/_toc.yml.in b/docs/sphinx/_toc.yml.in index 94767a67..3e3ed05b 100644 --- a/docs/sphinx/_toc.yml.in +++ b/docs/sphinx/_toc.yml.in @@ -27,27 +27,19 @@ subtrees: title: Post-Install Instructions - file: how-to/native-install/package-manager-integration title: Package Manager Integration - - file: how-to/deep-learning-rocm + - file: how-to/prerequisites + - file: how-to/docker + - file: how-to/spack + - file: how-to/3rd-party/index subtrees: - entries: - file: how-to/3rd-party/magma-install - file: how-to/3rd-party/pytorch-install - file: how-to/3rd-party/tensorflow-install -- caption: Understand ROCm - entries: - - title: Compiler Disambiguation - file: understand/compiler_disambiguation - - file: understand/isv_deployment_win - - file: understand/cmake_packages - - file: understand/file_reorg - - caption: Reference entries: - file: reference/system-requirements - file: reference/3rd-party-support-matrix - file: reference/docker-image-support-matrix - -- caption: About - entries: - - file: about + - file: reference/user-kernel-space-compat-matrix