Radeon VII #175

commandline-be · 2024-11-04T08:33:44Z

owner of a Radeon VII card, if i can help testing code to run well on it, let me know

lamikr · 2024-11-07T09:12:27Z

Hi, that would be really interesting!

Most of the rocm components seems to had by default the support for in code for the Radeon VII/gfx906 in place by default and couple of weeks ago I went through all the common places typically requiring patching. I have tested that everything should now build for these cards but not been able to test functionality with VII.

That said, if you have time it would be great if you could try to make the build and test it. These steps should help you to get started.

git clone [email protected]:lamikr/rocm_sdk_builder.git 
cd rocm_sdk_builder
./install_deps.sh
./babs.sh -c (choose gfx906)
./babs.sh -b

And once the build has been progressed to building rocminfo and amd-smi, those command would be good way to start checking the build.


source /opt/rocm/bin/env_rocm.sh
rocminfo
amd-smi metrics

Hip and opencl -compiler tests should be also doable pretty soon (no need to wait whole build to finish)

source /opt/rocm/bin/env_rocm.sh
cd /opt/rocm_sdk_612/docs/examples/hipcc/hello_world
./build.sh
cd /opt/rocm_sdk_612/docs/examples/opencl/hello_world
./build.sh

Once the build has finished, if things works well, then also pytorch should have the support for you gpu.
Some basic benchmarks are done by

cd benchmarks
./run_and_save_benchmarks.sh

If these works, then you can also build the llama_cpp, stable-diffusion-webui and vllm with command:

./babs.sh -b binfo/extra/ai_tools.blist

All of those have also own example apps you can run either on console or by starting their web-server and then connecting to it via browser. (I can help more later if needed)

commandline-be · 2024-11-13T09:10:29Z

Thanks :-) I'll try that asap

Said-Akbar · 2024-11-19T22:44:47Z

Hello @lamikr ,

Thank you for your amazing work! I am really glad I found this repo.

I have two AMD MI60 cards (gfx906). I will also compile this repo and share test results with you!

I am specifically interested in VLLM batch/concurrent inference speeds. So far, I was not able to compile VLLM with default installations of ROCM 6.2.2 and VLLM.
Another issue I faced was lack of flash attention support. I see this repo has aotriton with support for gfx906. I hope aotriton implementation of flash attention works with this repo. Reference: ROCm/aotriton#39

There is also composable_kernel based flash attention implementation here - https://github.com/ROCm/flash-attention (v2.6.3). This FA compiles fine with default ROCM 6.2.2 in Ubuntu 22.04 but exllamav2 backend with llama3 8B started generating gibberish text (exllamav2 works fine without FA2; but exllamav2 is very slow without FA2). I hope this repo fixes this gibberish text generation problem with FA2.

Thanks again!

Said-Akbar · 2024-11-20T05:38:05Z

Quick update. I did a fresh installation of Ubuntu 24.04.1 today which takes around 6.5GB SSD storage. It installs Nvidia GPU drivers by default. I assumed this repo would install AMD GPU drivers but no, it did not. Probably, this should be mentioning in README with a brief description of how to install GPU drivers. So, I installed AMD GPU drivers as follows:

sudo apt update
sudo apt install "linux-headers-$(uname -r)" "linux-modules-extra-$(uname -r)"
sudo usermod -a -G render,video $LOGNAME # Add the current user to the render and video groups
wget https://repo.radeon.com/amdgpu-install/6.2.4/ubuntu/noble/amdgpu-install_6.2.60204-1_all.deb
sudo apt install ./amdgpu-install_6.2.60204-1_all.deb
sudo apt update

Also, there were several packages missing in Ubuntu which I had to install after I saw error messages in ./install_deps.sh.

sudo apt install rpm
sudo apt install python3-pip
sudo apt install git-lfs

Only after that, I was able to run ./install_deps.sh without errors.
I selected gfx906 for ./babs.sh -c and now I'm waiting for ./babs.sh -b to finish. So, far it has been running for 1.5 hours on my AMD 5950x CPU with 96GB DDR4 3200Mhz.
Currently, the script is installing flang_libpgmath.

Another feedback. Can you please include a global progress bar that says how many packages were built and the total number of packages remaining in terminal logs?

Said-Akbar · 2024-11-20T05:54:25Z

ok, I want to report an error that occurred while building the source code.
I ran ./babs.sh -b and after 1.5 hours, this is the error message I see:

-- LIBOMPTARGET: Not building hostexec for NVPTX because cuda not found
   -- Building hostexec with LLVM 17.0.0git found with CLANG_TOOL /opt/rocm_sdk_612/bin/clang
-- LIBOMPTARGET: Building the llvm-omp-device-info tool
-- LIBOMPTARGET: Building the llvm-omp-kernel-replay tool
-- LIBOMPTARGET: Building DeviceRTL. Using clang: /opt/rocm_sdk_612/bin/clang, llvm-link: /opt/rocm_sdk_612/bin/llvm-link and opt: /opt/rocm_sdk_612/bin/opt
-- LIBOMPTARGET: DeviceRTLs gfx906: Getting ROCm device libs from /opt/rocm_sdk_612/lib64/cmake/AMDDeviceLibs
 ===================> bc_files: Configuration.cpp-400-gfx906.bc;Debug.cpp-400-gfx906.bc;Kernel.cpp-400-gfx906.bc;LibC.cpp-400-gfx906.bc;Mapping.cpp-400-gfx906.bc;Misc.cpp-400-gfx906.bc;Parallelism.cpp-400-gfx906.bc;Reduction.cpp-400-gfx906.bc;State.cpp-400-gfx906.bc;Synchronization.cpp-400-gfx906.bc;Tasking.cpp-400-gfx906.bc;Utils.cpp-400-gfx906.bc;Workshare.cpp-400-gfx906.bc;ExtraMapping.cpp-400-gfx906.bc;Xteamr.cpp-400-gfx906.bc;Memory.cpp-400-gfx906.bc;Xteams.cpp-400-gfx906.bc;/home/saidp/Downloads/rocm_sdk_builder/builddir/016_03_llvm_project_openmp/libm-400-gfx906.bc;/home/saidp/Downloads/rocm_sdk_builder/builddir/016_03_llvm_project_openmp/libomptarget/hostexec/libhostexec-400-gfx906.bc;/opt/rocm_sdk_612/amdgcn/bitcode/ocml.bc;/opt/rocm_sdk_612/amdgcn/bitcode/ockl.bc;/opt/rocm_sdk_612/amdgcn/bitcode/oclc_wavefrontsize64_on.bc;/opt/rocm_sdk_612/amdgcn/bitcode/oclc_isa_version_906.bc;/opt/rocm_sdk_612/amdgcn/bitcode/oclc_abi_version_400.bc ========================
-- LIBOMPTARGET: DeviceRTLs gfx906: Getting ROCm device libs from /opt/rocm_sdk_612/lib64/cmake/AMDDeviceLibs
 ===================> bc_files: Configuration.cpp-500-gfx906.bc;Debug.cpp-500-gfx906.bc;Kernel.cpp-500-gfx906.bc;LibC.cpp-500-gfx906.bc;Mapping.cpp-500-gfx906.bc;Misc.cpp-500-gfx906.bc;Parallelism.cpp-500-gfx906.bc;Reduction.cpp-500-gfx906.bc;State.cpp-500-gfx906.bc;Synchronization.cpp-500-gfx906.bc;Tasking.cpp-500-gfx906.bc;Utils.cpp-500-gfx906.bc;Workshare.cpp-500-gfx906.bc;ExtraMapping.cpp-500-gfx906.bc;Xteamr.cpp-500-gfx906.bc;Memory.cpp-500-gfx906.bc;Xteams.cpp-500-gfx906.bc;/home/saidp/Downloads/rocm_sdk_builder/builddir/016_03_llvm_project_openmp/libm-500-gfx906.bc;/home/saidp/Downloads/rocm_sdk_builder/builddir/016_03_llvm_project_openmp/libomptarget/hostexec/libhostexec-500-gfx906.bc;/opt/rocm_sdk_612/amdgcn/bitcode/ocml.bc;/opt/rocm_sdk_612/amdgcn/bitcode/ockl.bc;/opt/rocm_sdk_612/amdgcn/bitcode/oclc_wavefrontsize64_on.bc;/opt/rocm_sdk_612/amdgcn/bitcode/oclc_isa_version_906.bc;/opt/rocm_sdk_612/amdgcn/bitcode/oclc_abi_version_500.bc ========================


CMake Error at cmake/OpenMPTesting.cmake:209 (add_custom_target):
  add_custom_target cannot create target
  "check-libomptarget-nvptx64-nvidia-cuda" because another target with the
  same name already exists.  The existing target is a custom target created
  in source directory
  "/home/saidp/Downloads/rocm_sdk_builder/src_projects/llvm-project/openmp/libomptarget/test".
  See documentation for policy CMP0002 for more details.
Call Stack (most recent call first):
  libomptarget/test/CMakeLists.txt:23 (add_openmp_testsuite)


CMake Error at cmake/OpenMPTesting.cmake:209 (add_custom_target):
  add_custom_target cannot create target
  "check-libomptarget-nvptx64-nvidia-cuda-LTO" because another target with
  the same name already exists.  The existing target is a custom target
  created in source directory
  "/home/saidp/Downloads/rocm_sdk_builder/src_projects/llvm-project/openmp/libomptarget/test".
  See documentation for policy CMP0002 for more details.
Call Stack (most recent call first):
  libomptarget/test/CMakeLists.txt:23 (add_openmp_testsuite)

Attaching the full error output ./babs.sh -b >>error_output.txt 2>&1 after running it the second time for reference:
error_output.txt

Short info about my PC:
OS: Ubuntu 24.04.1
CPU: AMD 5950x
RAM: 96GB DDR4 3200Mhz
Storage: SSD 1TB + HDD
GPUs: RTX 3090 (for Video output), 2xAMD MI60 (gfx906).

I ran the following commands and they worked.

source /opt/rocm_sdk_612/bin/env_rocm.sh
rocminfo
amd-smi metric
cd /opt/rocm_sdk_612/docs/examples/hipcc/hello_world
./build.sh
cd /opt/rocm_sdk_612/docs/examples/opencl/hello_world
./build.sh

rocminfo correctly showed those two MI60 cards. hipcc and opencl examples worked without errors.
Only ./run_and_save_benchmarks.sh did not work due to missing torch library.

Please, let me know if I need to install Cuda libraries or else, how I fix the error above.

Thanks!

Said-Akbar · 2024-11-20T18:53:39Z

@lamikr , I think the error I am seeing might be related to spack/spack#45411 but not sure how I implement the fix here. Let me know. thanks!

Said-Akbar · 2024-11-21T03:11:19Z

Quick update. Installation is working after I remove all nvidia drivers and restart my PC.

sudo apt-get purge nvidia*
sudo apt-get autoremove
sudo apt-get autoclean

Now, Ubuntu is using X.Org Server Nouveau drivers.

Said-Akbar · 2024-11-22T02:58:02Z

Finally, ROCM SDK was installed on my PC after 5 hours. It takes ~90GB of space in rocm_sdk_builder, 8.5GB in the triton folder, ~2GB in the lib/x86_64-linux-gnu folder (mostly LLVM) and ~20GB in opt/rocm_sdk_612 folder. Total of 120GB of files! Is there a way to create an installable version of my current setup (all 120GB)? It is huge and time-consuming. For comparison, rocm installation from binaries takes around 30GB.

Said-Akbar · 2024-11-22T03:08:39Z

here are the benchmark results. I think the flash attention test failed.

./run_and_save_benchmarks.sh
Timestamp for benchmark results: 20241121_190404
Saving to file: 20241121_190404_cpu_vs_gpu_simple.txt
Benchmarking CPU and GPUs
Pytorch version: 2.4.1
ROCM HIP version: 6.1.40093-61a06a2f8
       Device:  AMD Ryzen 9 5950X 16-Core Processor
    'CPU time: 26.503 sec
       Device: AMD Radeon Graphics
    'GPU time: 0.399 sec
       Device: AMD Radeon Graphics
    'GPU time: 0.353 sec
Benchmark ready

Saving to file: 20241121_190404_pytorch_dot_products.txt
Pytorch version: 2.4.1
dot product calculation test
tensor([[[ 0.2042, -0.5683,  0.5711,  1.5666, -0.8859, -0.4255, -0.6103,
          -0.5932],
         [-0.1816, -1.0552,  0.3676,  2.1399, -0.8622,  0.1185, -0.4614,
          -0.4577],
         [ 0.2491, -0.5238,  0.5873,  1.5027, -0.8808, -0.4906, -0.6309,
          -0.6083]],

        [[-0.0812,  0.5027, -0.0134, -0.1771, -1.6389,  0.0154, -1.1964,
          -0.3948],
         [-0.3459, -0.4265,  0.0969,  0.0608, -0.9923, -0.4199, -0.7190,
          -0.0208],
         [-0.2615, -0.6958,  0.1066, -0.1948, -1.2152, -0.1223, -0.6278,
           0.1627]]], device='cuda:0')

Benchmarking cuda and cpu with Default, Math, Flash Attention amd Memory pytorch backends
Device: AMD Radeon Graphics / cuda:0
    Default benchmark:
:0:/home/saidp/Downloads/rocm_sdk_builder/src_projects/clr/hipamd/src/hip_global.cpp:114 : 8471950880 us: [pid:454884 tid:0x7ad2a9db0b80] Cannot find Symbol with name: Cijk_Alik_Bljk_HHS_BH_MT128x64x16_SE_APM1_AF0EM2_AF1EM1_AMAS3_ASAE01_ASCE01_ASEM2_BL1_BS1_DTLA0_DTLB0_EPS1_FL1_GLVWA4_GLVWB4_GRVW4_GSU1_GSUASB_ISA906_IU1_K1_KLA_LPA0_LPB0_LDL1_LRVW4_MDA1_MMFGLC_NLCA1_NLCB1_ONLL1_PK0_PGR1_PLR1_SIA1_SU32_SUM0_SUS256_SVW4_SNLL0_TT8_4_USFGROn1_VAW2_VSn1_VW4_VWB4_WG16_16_1_WGM1

Said-Akbar · 2024-11-22T03:43:12Z

that error above is causing llama.cpp not to run any models on GPU. Let me file a bug.

commandline-be · 2024-11-24T18:03:45Z

@lamikr finally got round to do the testing

initially the build went smooth-ish
then i noticed something failed

after doing a ./bass.sh --clean and starting ./babs.sh -b again i now get an error on '

HIP_COMPILER=clang
HIP_RUNTIME=rocclr
ROCM_PATH=/opt/rocm_sdk_612
HIP_ROCCLR_HOME=/opt/rocm_sdk_612
HIP_CLANG_PATH=/opt/rocm_sdk_612/bin
HIP_INCLUDE_PATH=/opt/rocm_sdk_612/include
HIP_LIB_PATH=/opt/rocm_sdk_612/lib
DEVICE_LIB_PATH=/opt/rocm_sdk_612/amdgcn/bitcode
HIP_CLANG_RT_LIB=/opt/rocm_sdk_612/lib/clang/17/lib/linux
hipcc-args: -DENABLE_BACKTRACE -DHAVE_BACKTRACE_H -I/usr/src/rocm_sdk_builder/src_projects/roctracer/src/util -O3 -DNDEBUG -fPIC -Wall -Werror -std=gnu++17 -MD -MT src/CMakeFiles/util.dir/util/debug.cpp.o -MF CMakeFiles/util.dir/util/debug.cpp.o.d -o CMakeFiles/util.dir/util/debug.cpp.o -c /usr/src/rocm_sdk_builder/src_projects/roctracer/src/util/debug.cpp
hipcc-cmd: "/opt/rocm_sdk_612/bin/clang" -isystem "/opt/rocm_sdk_612/include" --offload-arch=gfx906 -mllvm -amdgpu-early-inline-all=true -mllvm -amdgpu-function-calls=false --hip-path="/opt/rocm_sdk_612" --hip-device-lib-path="/opt/rocm_sdk_612/amdgcn/bitcode" -DENABLE_BACKTRACE -DHAVE_BACKTRACE_H -I/usr/src/rocm_sdk_builder/src_projects/roctracer/src/util -O3 -DNDEBUG -fPIC -Wall -Werror -std=gnu++17 -MD -MT src/CMakeFiles/util.dir/util/debug.cpp.o -MF CMakeFiles/util.dir/util/debug.cpp.o.d -o "CMakeFiles/util.dir/util/debug.cpp.o" -c -x hip /usr/src/rocm_sdk_builder/src_projects/roctracer/src/util/debug.cpp
/usr/src/rocm_sdk_builder/src_projects/roctracer/src/util/debug.cpp:33:10: fatal error: 'backtrace.h' file not found
33 | #include <backtrace.h>
| ^~~~~~~~~~~~~
1 error generated when compiling for gfx906.
make[2]: *** [src/CMakeFiles/util.dir/build.make:76: src/CMakeFiles/util.dir/util/debug.cpp.o] Error 1
make[2]: Leaving directory '/usr/src/rocm_sdk_builder/builddir/011_01_roctracer'
make[1]: *** [CMakeFiles/Makefile2:220: src/CMakeFiles/util.dir/all] Error 2
make[1]: Leaving directory '/usr/src/rocm_sdk_builder/builddir/011_01_roctracer'
make: *** [Makefile:156: all] Error 2
build failed: roctracer

lamikr · 2024-11-24T21:33:05Z

Hi. thanks for the reports. The flash attention support for gfx906 would need to be implemented in aotriton.
As it's gfc based gpu, I need to check would the triton code there that supports newwer gfx9* cards could get to work also with gfx906.

Althought I do not have the gfx906, I will start a new build for it with ubuntu 24.04 and try to reproduce the build errors. If you have some fixes, are you able to make pull request?

commandline-be · 2024-11-25T08:07:21Z

hey @lamikr

The build is on LinuxMint Debian Edition, if need be i can make pull requests
can you help identify the backtrace.h origin ?

lamikr · 2024-11-25T23:06:00Z

I have multiple versions of it under src_projects directory

$ cd src_projects/
$ find -name backtrace.h

./rocgdb/libbacktrace/backtrace.h
./rocMLIR/external/llvm-project/compiler-rt/lib/gwp_asan/optional/backtrace.h
./binutils-gdb/libbacktrace/backtrace.h
./openmpi/opal/mca/backtrace/backtrace.h
./llvm-project/compiler-rt/lib/gwp_asan/optional/backtrace.h
./pytorch/third_party/tensorpipe/third_party/libnop/include/nop/utility/backtrace.h

I am not sure what is causing it. Maybe the install directory /opt/rocm_sdk_612 should also be removed and then start a clean build. Lets try to reset everything and then start a fresh build.
(Normally this should not be needed and only command would be ./babs.sh -up and ./babs.sh -b to get only the changed projects rebuild)

./babs.sh -ca
./babs.sh -up
./babs.sh --clean
rm -rf /opt/rocm_sdk_612
./babs.sh -b

I have not solved yet the llama.cpp error with gfx906 but trying to add more debug to next build related to that.
Lets track that issue on #180

cb88 · 2024-12-12T01:15:38Z

I can get as far as running the HIP and CL hello worlds, but cannot run the run and save benchmarks script.

-- MIGraphX is using hipRTC
-- MIGraphx is using Find-2.0 API of MIOpen
-- MIGraphx is using Find Mode API of MIOpen
-- MIGraphx is using Beta API of rocBLAS
-- MIGraphX is using Beta API of rocBLAS for FP8 computations
-- Enable test package migraphx
-- rocm-cmake: Set license file to /home/cb88/rocm_sdk_builder/src_projects/AMDMIGraphX/LICENSE.
-- Generate ctest file
-- Configuring done (1.4s)
CMake Error in src/py/CMakeLists.txt:
Imported target "pybind11::module" includes non-existent path

"/include"

in its INTERFACE_INCLUDE_DIRECTORIES. Possible reasons include:

The path was deleted, renamed, or moved to another location.
An install or uninstall procedure did not complete successfully.
The installation package was faulty and references files it does not
provide.

CMake Error in src/py/CMakeLists.txt:
Imported target "pybind11::pybind11" includes non-existent path

"/include"

in its INTERFACE_INCLUDE_DIRECTORIES. Possible reasons include:

The path was deleted, renamed, or moved to another location.
An install or uninstall procedure did not complete successfully.
The installation package was faulty and references files it does not
provide.

CMake Warning (dev) at /opt/rocm_sdk_612/share/rocmcmakebuildtools/cmake/ROCMTest.cmake:230 (install):
Policy CMP0095 is not set: RPATH entries are properly escaped in the
intermediary CMake install script. Run "cmake --help-policy CMP0095" for
policy details. Use the cmake_policy command to set the policy and
suppress this warning.

RPATH entries for target 'test_verify' will not be escaped in the
intermediary cmake_install.cmake script.
Call Stack (most recent call first):
test/verify/CMakeLists.txt:29 (rocm_install_test)
This warning is for project developers. Use -Wno-dev to suppress it.

-- Generating done (2.5s)
CMake Generate step failed. Build files cannot be regenerated correctly.
configure failed: AMDMIGraphX

lamikr · 2025-01-10T08:56:27Z

I have now done couple of docker images from the quite new rocm_sdk_builder for different gpu-architectures.
I was not able to build a support for all possible GPU's to single docker/podman image because the size of some libraries grow so big that the linking started to fail first on composable kernel and then on pytorch.
So at the moment I have 3 different images build:

1 for cdna GPUs
1 for rdna 1 and rdna 2 GPUs
1 for rdna 3 and 3.5 gpus

Docker image for cdna-cards supports gfx906 and works at least with MI50, so I believe it could also work with Vega VII. The problem is that even with xz-compressed the size of these images is 6GB. (and as uncompessed it is about 50gb) These images have every application included from rocm sdk builder.

Do you know is it possible to upload xz compressed docker images to docker hub?

lamikr · 2025-01-10T09:03:48Z

@Said-Akbar, @cb88 and @commandline-be
I just tested the cdna docker image and it works for me with MI50.

Here are the commands I used to import and run it: (tmpdir was needed because image is so big that to import would fail if reqular /tmp memory dir is used)

image import

$ sudo su
$ mkdir -p /opt/containers/tmp
$ export TMPDIR=/opt/containers/tmp
$ podman import --change CMD=/bin/bash --change ENTRYPOINT=/bin/bash rocm_sdk_builder_612_cdna_docker.tar rocm_sdk_612_cdna

run the imported docker image by giving also permissions to gpu

podman run -it --device=/dev/kfd --device=/dev/dri --group-add keep-groups localhost/rocm_sdk_612_cdna bash

Test it (no need to run source /opt/rocm_sdk_612/bin/rocm_env.sh as I have added that to .bashrc in image)


[root@localhost podman]# podman run -it --device=/dev/kfd --device=/dev/dri --group-add keep-groups localhost/rocm_sdk_612_cdna bash
root@1cbe841f5f4d:/# cd /opt/rocm_sdk_612/docs/examples/hipcc/hello_world/
root@1cbe841f5f4d:/opt/rocm_sdk_612/docs/examples/hipcc/hello_world# ./build.sh 
rm -f ./hello_world
rm -f hello_world.o
rm -f /opt/rocm_sdk_612/src/*.o
/opt/rocm_sdk_612/bin/hipcc -g -fPIE   -c -o hello_world.o hello_world.cpp
/opt/rocm_sdk_612/bin/hipcc hello_world.o -fPIE -o hello_world
./hello_world
 System minor: 0
 System major: 9
 Agent name: AMD Instinct MI60 / MI50
Kernel input: GdkknVnqkc
Expecting that kernel increases each character from input string by one
Kernel output string: HelloWorld
Output string matched with HelloWorld
Test ok!

Now I would need some location where to upload it.

Said-Akbar · 2025-01-11T14:24:18Z

Hello @lamikr ,

I checked docker hub and they have a limit of 2GB. Since it is not enough for this container, I can suggest that you create a compressed file and upload it to a google drive. Each new gmail account will have 15GB of free storage. Then you can make the folder public and share with us. Thanks!

lamikr · 2025-01-12T00:25:03Z

Hi @Said-Akbar, I did what you suggested :-)
Image for CDNA GPU's is now in

https://drive.google.com/drive/folders/1XnoSvL41XhrKT_5NbBSrUZ_1LaVpQ-xb

Let me know how it work. I put there also a file with some instructions as at least I needed to change the TMPDIR location during import time to avoid running out of memory.

Said-Akbar · 2025-01-12T06:19:51Z

Thank you! I will try it tomorrow.

Rongronggg9 · 2025-01-12T14:43:13Z

I checked docker hub and they have a limit of 2GB.

GitHub offers a free docker registry for public repos. The size limit is 10GiB per layer.
https://docs.github.com/en/packages/working-with-a-github-packages-registry/working-with-the-container-registry#troubleshooting

lamikr · 2025-01-13T00:29:27Z

Do you have experience for doing docker builds on github and what does the layer mean here? I would think that the end-image size is at least 10-50 gb uncompressed and as compressed it would be about 2-6 gb.

At the moment I am doing the build by using files in folder

docs/notes/containers/docker_build folder

I have splitted there the Dockerfile to multiple parts because I had some kind of image merge error in the end, when I just tried to build too many projects in one step.

At the moment I will first put everything in comment on Dockerfile after line
RUN cd rocm_sdk_builder && ./babs.sh -b docs/notes/containers/blist/binfo_core_p1.blist
There the binfo_core_p1.blist file has list of apps that are wanted to be build on first phase:

`

binfo/core/001_rocm_core.binfo
binfo/core/002_01_cmake_backport.binfo
binfo/core/002_02_zstd_backport.binfo
...
binfo/core/010_02_rocthrust.binfo
`

and then I will call for example a script:

./build_rocm_sdk_container_rdna3.sh

Once that step finishes and image is saved, I uncomment the line asking to build the second set:
RUN cd rocm_sdk_builder && ./babs.sh -b docs/notes/containers/blist/binfo_core_p2.blist

and then launch the ./build_rocm_sdk_container_rdna3.sh script again.
In this way if there is some build problem on p2.blist, I just need to fix that and then restart from phase p2 instead of restating the whole build from the beginning.

Maybe there is some better/smarter way of doing the commits between RUN commands that I am missing. Something similar than calling "podman tag " directly from the Dockerfile between RUN commands.

Or alternatively there could maybe be a script that adds those "RUN commands" dynamically to Dockerfile after each succesful build phase. That would be nice to try on github.

Rongronggg9 · 2025-01-13T15:00:12Z

what does the layer mean here?

Most (if not all) Dockerfile instructions result in a new layer to be created. You may examine layers of existing images using docker image history.

I would think that the end-image size is at least 10-50 gb uncompressed and as compressed it would be about 2-6 gb.

I guess the size limit is applied to the compressed size of each layer, but I am not very sure.

I had some kind of image merge error in the end, when I just tried to build too many projects in one step.

That's a bad practice even if it succeeds in merging since a massive image layer is very hard to upload/download. Splitting build steps into multiple RUN does the right thing.

Maybe there is some better/smarter way of doing the commits between RUN commands that I am missing.

Use the layer id (hash) in docker image history. For example, if you'd like to start from layer 1234567890ab, create a temporary Dockerfile like:

FROM 1234567890ab
# ... remaining steps ...

and build it.

Or alternatively there could maybe be a script that adds those "RUN commands" dynamically to Dockerfile after each succesful build phase.

docker container commit is here to help.

For example:

Create a base Dockerfile and build it. docker build -t rsb_image_base .

FROM ubuntu:24.04

# ... omitted ...

WORKDIR /
RUN apt update && apt install -y git git-lfs sudo vim
RUN git clone --recursive https://github.com/lamikr/rocm_sdk_builder.git
WORKDIR /rocm_sdk_builder
RUN git checkout master
RUN ./docs/notes/containers/scripts/preconfig_ubuntu.sh
RUN ./docs/notes/containers/scripts/select_target_gpus.sh ${ENV_VAR__TARGET_GPU_CFG_FILE_ALL}

docker run --name rsb_container_step1 rsb_image_base ./docs/notes/containers/scripts/download_binfo_src.sh
docker container commit rsb_container_step1 rsb_image_step1
docker run --name rsb_container_step2 rsb_image_step1 ./babs.sh -b docs/notes/containers/blist/binfo_core_p1.blist
docker container commit rsb_container_step2 rsb_image_step2
......

I noticed that your Dockerfile does some cleanup jobs in the final layers. It is useless as all deleted files will remain in the previous layers (each layer, once created, is immutable).

lamikr · 2025-01-13T17:56:20Z

Thanks for the suggestions... So basically you suggested that the Dockerfile itself has only a commands for creating the very base image and then all other "docker run" commands following are called from the shell script instead of adding them to Dockerfile itself.

I did not realize that it can be done that way. I had thought that I need to modify the Dockerfile itself dynamically from the script between each build step by using "echo "MY command" >> Dockerfile". Error-check needs to be anyway added between each step so that the script will stop, if some of the steps failed.

Reason for the cleanup task in the end is that in that way the exported image created with "podman export" will be smaller. That one will contain only the files that are in the image on final step. The image that is now shared in gmail fileshare is done in that way. In that way I was able to reduce the size of the exported image for tens of gigs.

User of the docker image can get these files anyway back if he want by using babs.sh commands. It even allows updating and rebuilding it partially with commands like:

cd /rocm_sdk_builder
./babs.sh -up
./babs.sh -b

Rongronggg9 · 2025-01-14T16:42:43Z

The export sub-command merges all layers together, resulting in any files that are not in the last layer being vanished. It's fine when you share the image as a tarball. However, that's not the case when pushing an image to a registry (Docker hub, GitHub Packages, etc) - all layers are pushed so deleting files in the last layers doesn't help.
If you do want to push a squashed image to a registry, building a multi-stage image is the preferable way, e.g.,

FROM ubuntu:24.04 AS builder

# ... all build steps ... 

FROM scratch
COPY --from=builder / /
# Now you get a single-layer image

That being said, the best practice to build a Docker image usually follows these practices:

Do not clone your repo inside the container, instead, COPY it from a local clone, especially when the Dockerfile is designed for CI/CD. Doing so ensures build cache is able to work properly.
Besides, only COPY those files that are strictly needed in the next RUN step. Using this strategy, you don't need the dirty "build-uncomment-build" cycle. Any modification to the copied files results in the following steps to be rebuilt. This is exactly why layers exist in the first place - making build cache work efficiently. Again this practice is very useful in CI/CD in order that a build task does not require rebuilding every step for just tiny changes.

If you'd like to setup GitHub Actions to build Docker images, there is a limit that "each job in a workflow can run for up to 6 hours of execution time".

https://docs.github.com/en/actions/administering-github-actions/usage-limits-billing-and-administration

To work around the limit, splitting the Dockerfile is needed anyway in order that a build workflow can be split into multiple jobs - several jobs build some groups of dependencies, and they are finally aggregated in jobs that build the last monsters (pytorch, etc). This strategy has some other advantages, e.g. if two or more components do not depend on each other, they can be built in different jobs (thus on different GHA runners) simultaneously, cutting the total build time down greatly.

The current build process of the project follows a linear dependency chain (components are built one by one). Could you make a (rough) dependency graph among all components built by this project? Using such information, building some components simultaneously would be possible and I am willing to help to write proper Dockerfile and GHA workflows.

commandline-be · 2025-01-15T00:32:10Z

Thanks for this guide. Sadly I've not managed to push beyond the 'build failed: roctracer' situation.

This fails on 'hipGetDevicePropertiesR0600' with an undefined reference in MatrixTranspose_test.cpp:(.text+0x322)

This fails on 'hipGetDevicePropertiesR0600' with an undefined reference in MatrixTranspose_test.cpp:(.text+0x360)

I'm building this on Debian which is not throwing any compatibility issue afaik but does fail to build.

It seems also the build script is not aligned with the actual code tree.

in that i do not find env_rocm.sh in /opt/rocm/bin for example but i do find it under ./binfo/env

etc.

Please assist

Hi, that would be really interesting!

Most of the rocm components seems to had by default the support for in code for the Radeon VII/gfx906 in place by default and couple of weeks ago I went through all the common places typically requiring patching. I have tested that everything should now build for these cards but not been able to test functionality with VII.

That said, if you have time it would be great if you could try to make the build and test it. These steps should help you to get started.
git clone [email protected]:lamikr/rocm_sdk_builder.git 
cd rocm_sdk_builder
./install_deps.sh
./babs.sh -c (choose gfx906)
./babs.sh -b
And once the build has been progressed to building rocminfo and amd-smi, those command would be good way to start checking the build.
source /opt/rocm/bin/env_rocm.sh
rocminfo
amd-smi metrics
Hip and opencl -compiler tests should be also doable pretty soon (no need to wait whole build to finish)
source /opt/rocm/bin/env_rocm.sh
cd /opt/rocm_sdk_612/docs/examples/hipcc/hello_world
./build.sh
cd /opt/rocm_sdk_612/docs/examples/opencl/hello_world
./build.sh
Once the build has finished, if things works well, then also pytorch should have the support for you gpu. Some basic benchmarks are done by
cd benchmarks
./run_and_save_benchmarks.sh
If these works, then you can also build the llama_cpp, stable-diffusion-webui and vllm with command:

./babs.sh -b binfo/extra/ai_tools.blist

All of those have also own example apps you can run either on console or by starting their web-server and then connecting to it via browser. (I can help more later if needed)

commandline-be · 2025-01-15T07:36:10Z

After running

babs -rs
source /opt/rocm_sdk_612/bin/env_rocm.sh
babs -b

The result was the same, the file env_rocm.sh was not found in /opt/rocm/bin before or after

lamikr · 2025-01-15T23:57:40Z

@commandline-be It should be by default in

/opt/rocm_sdk_612/bin/env_rocm.sh

(not in /opt/rocm/bin/env_rocm.sh, as /opt/rocm folder is usually used by the AMD's own rocm builds)

Can you check whether you have /opt/rocm_sdk_612/bin/env_rocm.sh?

If yes, then you should also find some example apps to test on. For example:

cd /opt/rocm_sdk_612/docs/examples/hipcc/store_single_value
./build.sh

If not, let's try to do with smaller steps to find out what is the problem. The env-variable script should be installed alredy by the first package, so we can try to build only that one.

./babs.sh -up
./babs.sh --clean
sudo rm -rf /opt/rocm_sdk_612
./babs.sh -b binfo/core/001_rocm_core.binfo

After these commands you should have that script installed and only a couple of other files.
If could be useful, if you can paste the the output of "./babs.sh -b binfo/core/001_rocm_core.binfo" command.

lamikr · 2025-01-16T01:57:14Z

@Rongronggg9 Thanks for the feedback. I will try to do the Dockerfile and github-actions for github build now...

I was thinking something like this for the base-image that would be run on first command. It should stay under time and space limit for single layer.

FROM ubuntu:24.04

ARG ENV_VAR__TARGET_GPU_CFG_FILE_ALL=docs/notes/containers/config/build_cfg_all.user
ARG ENV_VAR__TARGET_GPU_CFG_FILE_SELECTED=docs/notes/containers/config/build_cfg_cdna.user

ARG ENV_VAR__TARGET_GPU_CFG_FILE_RDNA12=docs/notes/containers/config/build_cfg_rdna1_rdna2.user
ARG ENV_VAR__TARGET_GPU_CFG_FILE_RDNA3=docs/notes/containers/config/build_cfg_rdna3.user
ARG ENV_VAR__TARGET_GPU_CFG_FILE_CDNA=docs/notes/containers/config/build_cfg_cdna.user
ARG ENV_VAR__TARGET_GPU_CFG_FILE=docs/notes/containers/config/build_cfg_cdna.user
ARG ENV_VAR__TARGET_GPU_CFG_FILE_SELECTED=docs/notes/containers/config/build_cfg_cdna.user

WORKDIR /
RUN apt update && apt install -y git git-lfs sudo vim
RUN git clone --recursive https://github.com/lamikr/rocm_sdk_builder.git
RUN cd rocm_sdk_builder && git checkout master

# example how to apply updates to image during the build time
# RUN cd /rocm_sdk_builder && git reset --hard HEAD~1 && ./babs.sh -up
RUN cd rocm_sdk_builder && ./docs/notes/containers/scripts/preconfig_ubuntu.sh
RUN cd rocm_sdk_builder && ./docs/notes/containers/scripts/select_target_gpus.sh ${ENV_VAR__TARGET_GPU_CFG_FILE_ALL}
RUN cd rocm_sdk_builder && ./babs.sh -b binfo/core/001_rocm_core.binfo
RUN cd rocm_sdk_builder && ./babs.sh -b binfo/core/002_01_cmake_backport.binfo
RUN cd rocm_sdk_builder && ./babs.sh -b binfo/core/002_02_zstd_backport.binfo
RUN cd rocm_sdk_builder && ./babs.sh -b binfo/core/002_03_python_backport.binfo
RUN cd rocm_sdk_builder && ./babs.sh -b binfo/core/002_04_ffmpeg7_devel.binfo

And then for github-actions, I could first try with something like this to build the base image first with Dockerfile and then run single action to build llvm with a separate command after that to create a second layer.

name: rocm_sdk_builder_docker_image
on:
  workflow_dispatch:
    inputs:
      build:
        description: 'Build the docker image'
        required: true
        default: 'true'

jobs:
  build:
    runs-on: ubuntu-24.04
    steps:
    - name: checkout
      uses: actions/checkout@v4
    - name: build_rocm_sdk_builder_src_image
      run: docker build docs/notes/containers/github/docker_build -t rsb_image_base:latest

commandline-be · 2025-01-23T19:33:22Z

Progress it seems ...

[ 0%] Building C object external/llvm-project/llvm/lib/Support/BLAKE3/CMakeFiles/LLVMSupportBlake3.dir/blake3.c.o
cd /home/user/src/rocm_sdk_builder/builddir/031_01_miopen_rocMLIR/external/llvm-project/llvm/lib/Support/BLAKE3 && /usr/bin/cc -DBUILD_EXAMPLES -DGTEST_HAS_RTTI=0 -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GNU_SOURCE -D_LIBCPP_ENABLE_ASSERTIONS -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/user/src/rocm_sdk_builder/builddir/031_01_miopen_rocMLIR/external/llvm-project/llvm/lib/Support/BLAKE3 -I/home/user/src/rocm_sdk_builder/src_projects/rocMLIR/external/llvm-project/llvm/lib/Support/BLAKE3 -I/home/user/src/rocm_sdk_builder/builddir/031_01_miopen_rocMLIR/external/llvm-project/llvm/include -I/home/user/src/rocm_sdk_builder/src_projects/rocMLIR/external/llvm-project/llvm/include -I/opt/rocm_sdk_612/include -I/opt/rocm_sdk_612/hsa/include -I/opt/rocm_sdk_612/rocm_smi/include -I/opt/rocm_sdk_612/rocblas/include -fPIC -fno-semantic-interposition -Werror=date-time -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wno-missing-field-initializers -pedantic -Wno-long-long -Wimplicit-fallthrough -Wno-comment -ffunction-sections -fdata-sections -O3 -DNDEBUG -fPIC -UNDEBUG -MD -MT external/llvm-project/llvm/lib/Support/BLAKE3/CMakeFiles/LLVMSupportBlake3.dir/blake3.c.o -MF CMakeFiles/LLVMSupportBlake3.dir/blake3.c.o.d -o CMakeFiles/LLVMSupportBlake3.dir/blake3.c.o -c /home/user/src/rocm_sdk_builder/src_projects/rocMLIR/external/llvm-project/llvm/lib/Support/BLAKE3/blake3.c
Traceback (most recent call last):
File "/home/user/src/rocm_sdk_builder/src_projects/rocMLIR/external/llvm-project/mlir/lib/Dialect/GPU/AmdDeviceLibsIncGen.py", line 53, in
generate(Path(sys.argv[1]), Path(sys.argv[2]), sys.argv[3:])
File "/home/user/src/rocm_sdk_builder/src_projects/rocMLIR/external/llvm-project/mlir/lib/Dialect/GPU/AmdDeviceLibsIncGen.py", line 29, in generate
with (bcPath / (lib + ".bc")).open("rb") as libFile:
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/rocm_sdk_612/lib/python3.11/pathlib.py", line 1044, in open
return io.open(self, mode, buffering, encoding, errors, newline)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: '/usr/amdgcn/bitcode/ocml.bc'
make[2]: *** [external/llvm-project/llvm/tools/mlir/lib/Dialect/GPU/CMakeFiles/AmdDeviceLibsIncGen.dir/build.make:124: external/llvm-project/llvm/tools/mlir/lib/Dialect/GPU/Transforms/AmdDeviceLibs.cpp.inc] Error 1
make[2]: *** Deleting file 'external/llvm-project/llvm/tools/mlir/lib/Dialect/GPU/Transforms/AmdDeviceLibs.cpp.inc'
make[2]: Leaving directory '/home/user/src/rocm_sdk_builder/builddir/031_01_miopen_rocMLIR'
make[1]: *** [CMakeFiles/Makefile2:57100: external/llvm-project/llvm/tools/mlir/lib/Dialect/GPU/CMakeFiles/AmdDeviceLibsIncGen.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....
make[2]: Leaving directory '/home/user/src/rocm_sdk_builder/builddir/031_01_miopen_rocMLIR'
[ 0%] Built target llvm_vcsrevision_h

and eventually

[ 0%] Built target MLIRTableGen
make[1]: Leaving directory '/home/user/src/rocm_sdk_builder/builddir/031_01_miopen_rocMLIR'
make: *** [Makefile:156: all] Error 2
build failed: rocMLIR

commandline-be · 2025-01-23T19:40:18Z

After: apt install miopen-hip-dev and then babs -b again I now get

Dependencies file "external/llvm-project/llvm/lib/Support/CMakeFiles/LLVMSupport.dir/circular_raw_ostream.cpp.o.d" is newer than depends file "/home/user/src/rocm_sdk_builder/builddir/031_01_miopen_rocMLIR/external/llvm-project/llvm/lib/Support/CMakeFiles/LLVMSupport.dir/compiler_depend.internal".
Dependencies file "external/llvm-project/llvm/lib/Support/CMakeFiles/LLVMSupport.dir/raw_os_ostream.cpp.o.d" is newer than depends file "/home/user/src/rocm_sdk_builder/builddir/031_01_miopen_rocMLIR/external/llvm-project/llvm/lib/Support/CMakeFiles/LLVMSupport.dir/compiler_depend.internal".
Dependencies file "external/llvm-project/llvm/lib/Support/CMakeFiles/LLVMSupport.dir/raw_ostream.cpp.o.d" is newer than depends file "/home/user/src/rocm_sdk_builder/builddir/031_01_miopen_rocMLIR/external/llvm-project/llvm/lib/Support/CMakeFiles/LLVMSupport.dir/compiler_depend.internal".
Dependencies file "external/llvm-project/llvm/lib/Support/CMakeFiles/LLVMSupport.dir/xxhash.cpp.o.d" is newer than depends file "/home/user/src/rocm_sdk_builder/builddir/031_01_miopen_rocMLIR/external/llvm-project/llvm/lib/Support/CMakeFiles/LLVMSupport.dir/compiler_depend.internal".
Consolidate compiler generated dependencies of target LLVMSupport
make[2]: Leaving directory '/home/user/src/rocm_sdk_builder/builddir/031_01_miopen_rocMLIR'
make -f external/llvm-project/llvm/lib/Support/CMakeFiles/LLVMSupport.dir/build.make external/llvm-project/llvm/lib/Support/CMakeFiles/LLVMSupport.dir/build
make[2]: Entering directory '/home/user/src/rocm_sdk_builder/builddir/031_01_miopen_rocMLIR'
make[2]: Nothing to be done for 'external/llvm-project/llvm/lib/Support/CMakeFiles/LLVMSupport.dir/build'.
make[2]: Leaving directory '/home/user/src/rocm_sdk_builder/builddir/031_01_miopen_rocMLIR'
[ 4%] Built target LLVMSupport
make[1]: Leaving directory '/home/user/src/rocm_sdk_builder/builddir/031_01_miopen_rocMLIR'
make: *** [Makefile:156: all] Error 2
build failed: rocMLIR

and also after: apt install llvm-dev

make[2]: Nothing to be done for 'external/llvm-project/llvm/tools/mlir/lib/TableGen/CMakeFiles/MLIRTableGen.dir/build'.
make[2]: Leaving directory '/home/user/src/rocm_sdk_builder/builddir/031_01_miopen_rocMLIR'
[ 0%] Generating device libraries include package
[ 0%] Built target llvm_vcsrevision_h
[ 0%] Built target LLVMSupportBlake3
[ 0%] Built target LLVMDemangle
cd /home/user/src/rocm_sdk_builder/builddir/031_01_miopen_rocMLIR/external/llvm-project/llvm/tools/mlir/lib/Dialect/GPU && /home/user/src/rocm_sdk_builder/src_projects/rocMLIR/external/llvm-project/mlir/lib/Dialect/GPU/AmdDeviceLibsIncGen.py /home/user/src/rocm_sdk_builder/builddir/031_01_miopen_rocMLIR/external/llvm-project/llvm/tools/mlir/lib/Dialect/GPU/Transforms/AmdDeviceLibs.cpp.inc /usr ocml ockl hip asanrtl
make -f external/llvm-project/llvm/lib/Support/CMakeFiles/LLVMSupport.dir/build.make external/llvm-project/llvm/lib/Support/CMakeFiles/LLVMSupport.dir/depend
[ 0%] Built target obj.LLVMTableGenCommon
[ 0%] Built target MLIRTableGen
make[2]: Entering directory '/home/user/src/rocm_sdk_builder/builddir/031_01_miopen_rocMLIR'
cd /home/user/src/rocm_sdk_builder/builddir/031_01_miopen_rocMLIR && /usr/bin/cmake -E cmake_depends "Unix Makefiles" /home/user/src/rocm_sdk_builder/src_projects/rocMLIR /home/user/src/rocm_sdk_builder/src_projects/rocMLIR/external/llvm-project/llvm/lib/Support /home/user/src/rocm_sdk_builder/builddir/031_01_miopen_rocMLIR /home/user/src/rocm_sdk_builder/builddir/031_01_miopen_rocMLIR/external/llvm-project/llvm/lib/Support /home/user/src/rocm_sdk_builder/builddir/031_01_miopen_rocMLIR/external/llvm-project/llvm/lib/Support/CMakeFiles/LLVMSupport.dir/DependInfo.cmake --color=
Traceback (most recent call last):
File "/home/user/src/rocm_sdk_builder/src_projects/rocMLIR/external/llvm-project/mlir/lib/Dialect/GPU/AmdDeviceLibsIncGen.py", line 53, in
generate(Path(sys.argv[1]), Path(sys.argv[2]), sys.argv[3:])
File "/home/user/src/rocm_sdk_builder/src_projects/rocMLIR/external/llvm-project/mlir/lib/Dialect/GPU/AmdDeviceLibsIncGen.py", line 29, in generate
with (bcPath / (lib + ".bc")).open("rb") as libFile:
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/rocm_sdk_612/lib/python3.11/pathlib.py", line 1044, in open
return io.open(self, mode, buffering, encoding, errors, newline)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: '/usr/amdgcn/bitcode/ocml.bc'
make[2]: *** [external/llvm-project/llvm/tools/mlir/lib/Dialect/GPU/CMakeFiles/AmdDeviceLibsIncGen.dir/build.make:124: external/llvm-project/llvm/tools/mlir/lib/Dialect/GPU/Transforms/AmdDeviceLibs.cpp.inc] Error 1
make[2]: *** Deleting file 'external/llvm-project/llvm/tools/mlir/lib/Dialect/GPU/Transforms/AmdDeviceLibs.cpp.inc'
make[2]: Leaving directory '/home/user/src/rocm_sdk_builder/builddir/031_01_miopen_rocMLIR'
make[1]: *** [CMakeFiles/Makefile2:57100: external/llvm-project/llvm/tools/mlir/lib/Dialect/GPU/CMakeFiles/AmdDeviceLibsIncGen.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....
make[2]: Leaving directory '/home/user/src/rocm_sdk_builder/builddir/031_01_miopen_rocMLIR'
make -f external/llvm-project/llvm/lib/Support/CMakeFiles/LLVMSupport.dir/build.make external/llvm-project/llvm/lib/Support/CMakeFiles/LLVMSupport.dir/build
make[2]: Entering directory '/home/user/src/rocm_sdk_builder/builddir/031_01_miopen_rocMLIR'
make[2]: Nothing to be done for 'external/llvm-project/llvm/lib/Support/CMakeFiles/LLVMSupport.dir/build'.
make[2]: Leaving directory '/home/user/src/rocm_sdk_builder/builddir/031_01_miopen_rocMLIR'
[ 4%] Built target LLVMSupport
make[1]: Leaving directory '/home/user/src/rocm_sdk_builder/builddir/031_01_miopen_rocMLIR'
make: *** [Makefile:156: all] Error 2
build failed: rocMLIR

lamikr · 2025-01-24T07:43:57Z

You should not need to install the rocm deb files, those are most likely messing the build somehow.
They probably defines the ROCM_PATH or some other environment variable that confuses the build because some apps are then looking files under ROCM_PATH.

That's because I see in AmdDeviceLibsIncGen.py file that was in your error message a following on line 25:

def generate(outputPath: Path, rocmPath: Path, libs: List[str]) -> None:
bcPath = rocmPath / "amdgcn" / "bitcode"

Can you print the output of following commands:

env | grep ROCM
sudo apt list --installed | grep rocm
Or do you have /opt/rocm directory?

commandline-be · 2025-01-24T08:17:24Z

You should not need to install the rocm deb files, those are most likely messing the build somehow. They probably defines the ROCM_PATH or some other environment variable that confuses the build because some apps are then looking files under ROCM_PATH.

That's because I see in AmdDeviceLibsIncGen.py file that was in your error message a following on line 25:

def generate(outputPath: Path, rocmPath: Path, libs: List[str]) -> None: bcPath = rocmPath / "amdgcn" / "bitcode"

Can you print the output of following commands:

env | grep ROCM

sudo apt list --installed | grep rocm

Or do you have /opt/rocm directory?

Thanks for the informative feedback.

no results for ROCM and rocm
there were debs installed, I tried installed the deb to see if this affected the build process, these seems both true and false, depending on the type of file installed
yes, all builds find otherwise

should I consider running a source <path> before building ?

commandline-be · 2025-01-24T09:00:14Z

I've now removed anything related to rocm installed on the OS by the package manager. The build now appears to continue.

Thanks. I assumed the build process would not ingest anything from the OS for building.

commandline-be · 2025-01-24T13:33:42Z

After a long build it now fails at pytorch. I report the output below, should this matter. I'm restarting the entire build after a clean (-rs)

cc1plus: all warnings being treated as errors
[52/2619] Building CXX object third_party/fbgemm/asmjit/CMakeFiles/asmjit.dir/src/asmjit/x86/x86assembler.cpp.o
ninja: build stopped: subcommand failed.
build failed: pytorch
error in build cmd: ./build_rocm.sh /opt/rocm_sdk_612 60102 gfx906

--- before this FAILED is reported

41/2619] Building CXX object third_party/fbgemm/CMakeFiles/fbgemm_avx512.dir/src/UtilsAvx512.cc.o
FAILED: third_party/fbgemm/CMakeFiles/fbgemm_avx512.dir/src/UtilsAvx512.cc.o
/usr/bin/c++ -DFBGEMM_STATIC -I/home/user/src/rocm_sdk_builder/src_projects/pytorch/third_party/cpuinfo/include -I/home/user/src/rocm_sdk_builder/src_projects/pytorch/third_party/fbgemm/third_party/asmjit/src -I/home/user/src/rocm_sdk_builder/src_projects/pytorch/third_party/fbgemm/include -I/home/user/src/rocm_sdk_builder/src_projects/pytorch/third_party/fbgemm -isystem /home/user/src/rocm_sdk_builder/src_projects/pytorch/third_party/protobuf/src -isystem /home/user/src/rocm_sdk_builder/src_projects/pytorch/third_party/XNNPACK/include -Wno-error=maybe-uninitialized -D_GLIBCXX_USE_CXX11_ABI=1 -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -Wall -Wextra -Werror -Wno-deprecated-declarations -Wimplicit-fallthrough -O3 -DNDEBUG -fPIC -fvisibility=hidden -m64 -mavx2 -mfma -mavx512f -mavx512bw -mavx512dq -mavx512vl -std=c++17 -MD -MT third_party/fbgemm/CMakeFiles/fbgemm_avx512.dir/src/UtilsAvx512.cc.o -MF third_party/fbgemm/CMakeFiles/fbgemm_avx512.dir/src/UtilsAvx512.cc.o.d -o third_party/fbgemm/CMakeFiles/fbgemm_avx512.dir/src/UtilsAvx512.cc.o -c /home/user/src/rocm_sdk_builder/src_projects/pytorch/third_party/fbgemm/src/UtilsAvx512.cc
In file included from /usr/lib/gcc/x86_64-linux-gnu/12/include/immintrin.h:49,
from /home/user/src/rocm_sdk_builder/src_projects/pytorch/third_party/fbgemm/src/UtilsAvx512.cc:11:
In function ‘__m512 _mm512_unpacklo_ps(__m512, __m512)’,
inlined from ‘void fbgemm::{anonymous}::transpose_kernel_mxn_avx512(int, const float*, int64_t, float*, int64_t) [with int M = 16]’ at /home/user/src/rocm_sdk_builder/src_projects/pytorch/third_party/fbgemm/src/UtilsAvx512.cc:257:37:
/usr/lib/gcc/x86_64-linux-gnu/12/include/avx512fintrin.h:11541:10: warning: ‘__Y’ may be used uninitialized [-Wmaybe-uninitialized]
11541 | return (__m512) __builtin_ia32_unpcklps512_mask ((__v16sf) __A,
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
11542 | (__v16sf) __B,
| ~~~~~~~~~~~~~~
11543 | (__v16sf)
| ~~~~~~~~~
11544 | _mm512_undefined_ps (),
| ~~~~~~~~~~~~~~~~~~~~~~~
11545 | (__mmask16) -1);

lamikr · 2025-01-24T16:04:35Z

env | grep ROCM

sudo apt list --installed | grep rocm

Or do you have /opt/rocm directory?

Thanks for the informative feedback.
1. no results for ROCM and rocm

2. there were debs installed, I tried installed the deb to see if this affected the build process, these seems both true and false, depending on the type of file installed

3. yes, all builds find otherwise
should I consider running a source <path> before building ?

Thanks, this was good to information.

Hard to know what exactly caused that. Maybe there were some libraries or header files under /usr directory that confused it. I wished it would have been some environment variable like ROCM_HOME because that I could have been able to solve/fix more easily by re-dedining it in envsetup.sh that babs uses.

Any changes you could check which deb files you removed? (history command)

lamikr · 2025-01-25T22:18:29Z

@commandline-be If you are still seeing the same error, could you try to replace the line "unset CFLAGS"
with

export CFLAGS="-Wno-error=maybe-uninitialized"

from src_projects/pytorch/build_rocm.sh and then run

 ./babs.sh --clean binfo/core/039_02_pytorch.binfo
 ./babs.sh -b binfo/core/039_02_pytorch.binfo

to see whether pytorch would now build ok.

commandline-be · 2025-01-28T01:07:56Z

@lamikr did just that and the build also fails

Though to me the build works much beter with the APT removed, here's a list.

The commands were required because just using APT failed on dependency conflict.

dpkg -P rocm-core comgr hip-runtime-amd hipsparse hipsparse-dev hiptensor miopen-hip openmp-extras-runtime comgr hsa-rocr hsakmt-roct-dev rocblas rocm-language-runtime rocm-llvm rocm-ocl-icd rocm-opencl rocm-opencl-runtime rocminfo rocm-device-libs rocsparse hsa-rocr-dev hsa-rocr miopen-hip-dev hiptensor-dev rocblas-dev hipcc rocsparse-dev rocm-core rocm-device-libs

dpkg -P libamd-comgr-dev libamd-comgr2 libhsa-runtime-dev libhsa-runtime64-1 libamdhip64-5 libamd-comgr-dev libamdhip64-dev

Below the most meaningful parts for the build failing, could be an outdated gcc lib ?

/usr/lib/gcc/x86_64-linux-gnu/12/include/avx512fintrin.h: In function ‘void fbgemm::internal::SparseDenseInt8MMAvx512(int, const std::unique_ptr<fbgemm::BCSRMatrix<> >&, const uint8_t*, int, int32_t*, uint8_t*, int, fbgemm::trRequantizationParams_t&, bool, int, int) [with bool FUSE_RELU = false; fbgemm::QuantizationGranularity Q_GRAN = fbgemm::QuantizationGranularity::OUT_CHANNEL]’:
/usr/lib/gcc/x86_64-linux-gnu/12/include/avx512fintrin.h:206:11: note: ‘__Y’ was declared here
206 | __m512i __Y = __Y;
| ^~~
[5502/8176] Building CXX object third_party/fbgemm/CMakeFiles/fbgemm_avx512.dir/src/UtilsAvx512.cc.o
FAILED: third_party/fbgemm/CMakeFiles/fbgemm_avx512.dir/src/UtilsAvx512.cc.o
/usr/bin/c++ -DFBGEMM_STATIC -I/home/user/src/rocm_sdk_builder/src_projects/pytorch/third_party/cpuinfo/include -I/home/user/src/rocm_sdk_builder/src_projects/pytorch/third_party/fbgemm/third_party/asmjit/src -I/home/user/src/rocm_sdk_builder/src_projects/pytorch/third_party/fbgemm/include -I/home/user/src/rocm_sdk_builder/src_projects/pytorch/third_party/fbgemm -isystem /home/user/src/rocm_sdk_builder/src_projects/pytorch/third_party/protobuf/src -isystem /home/user/src/rocm_sdk_builder/src_projects/pytorch/third_party/XNNPACK/include -Wno-error=maybe-uninitialized -D_GLIBCXX_USE_CXX11_ABI=1 -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -Wall -Wextra -Werror -Wno-deprecated-declarations -Wimplicit-fallthrough -O3 -DNDEBUG -fPIC -fvisibility=hidden -m64 -mavx2 -mfma -mavx512f -mavx512bw -mavx512dq -mavx512vl -std=c++17 -MD -MT third_party/fbgemm/CMakeFiles/fbgemm_avx512.dir/src/UtilsAvx512.cc.o -MF third_party/fbgemm/CMakeFiles/fbgemm_avx512.dir/src/UtilsAvx512.cc.o.d -o third_party/fbgemm/CMakeFiles/fbgemm_avx512.dir/src/UtilsAvx512.cc.o -c /home/user/src/rocm_sdk_builder/src_projects/pytorch/third_party/fbgemm/src/UtilsAvx512.cc
In file included from /usr/lib/gcc/x86_64-linux-gnu/12/include/immintrin.h:49,
from /home/user/src/rocm_sdk_builder/src_projects/pytorch/third_party/fbgemm/src/UtilsAvx512.cc:11:
In function ‘__m512 _mm512_unpacklo_ps(__m512, __m512)’,
inlined from ‘void fbgemm::{anonymous}::transpose_kernel_mxn_avx512(int, const float*, int64_t, float*, int64_t) [with int M = 16]’ at /home/user/src/rocm_sdk_builder/src_projects/pytorch/third_party/fbgemm/src/UtilsAvx512.cc:257:37:
/usr/lib/gcc/x86_64-linux-gnu/12/include/avx512fintrin.h:11541:10: warning: ‘__Y’ may be used uninitialized [-Wmaybe-uninitialized]
11541 | return (__m512) __builtin_ia32_unpcklps512_mask ((__v16sf) __A,
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
11542 | (__v16sf) __B,
| ~~~~~~~~~~~~~~
11543 | (__v16sf)
| ~~~~~~~~~
11544 | _mm512_undefined_ps (),
| ~~~~~~~~~~~~~~~~~~~~~~~
11545 | (__mmask16) -1);
| ~~~~~~~~~~~~~~~
/usr/lib/gcc/x86_64-linux-gnu/12/include/avx512fintrin.h: In function ‘void fbgemm::{anonymous}::transpose_kernel_mxn_avx512(int, const float*, int64_t, float*, int64_t) [with int M = 16]’:
/usr/lib/gcc/x86_64-linux-gnu/12/include/avx512fintrin.h:188:10: note: ‘__Y’ was declared here
188 | __m512 __Y = __Y;
| ^~~
In function ‘__m512d _mm512_unpacklo_pd(__m512d, __m512d)’,
inlined from ‘void fbgemm::{anonymous}::transpose_kernel_mxn_avx512(int, const float*, int64_t, float*, int64_t) [with int M = 16]’ at /home/user/src/rocm_sdk_builder/src_projects/pytorch/third_party/fbgemm/src/UtilsAvx512.cc:266:36:
/usr/lib/gcc/x86_64-linux-gnu/12/include/avx512fintrin.h:8430:10: warning: ‘__Y’ may be used uninitialized [-Wmaybe-uninitialized]
8430 | return (__m512d) __builtin_ia32_unpcklpd512_mask ((__v8df) __A,
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
8431 | (__v8df) __B,
| ~~~~~~~~~~~~~
8432 | (__v8df)
| ~~~~~~~~
8433 | _mm512_undefined_pd (),
| ~~~~~~~~~~~~~~~~~~~~~~~
8434 | (__mmask8) -1);
| ~~~~~~~~~~~~~~
/usr/lib/gcc/x86_64-linux-gnu/12/include/avx512fintrin.h: In function ‘void fbgemm::{anonymous}::transpose_kernel_mxn_avx512(int, const float*, int64_t, float*, int64_t) [with int M = 16]’:
/usr/lib/gcc/x86_64-linux-gnu/12/include/avx512fintrin.h:198:11: note: ‘__Y’ was declared here
198 | __m512d __Y = __Y;
| ^~~

[5513/8176] Building CXX object third_party/fbgemm/CMakeFiles/fbgemm_avx2.dir/src/FbgemmI8DepthwiseAvx2.cc.o
ninja: build stopped: subcommand failed.
build failed: pytorch
error in build cmd: ./build_rocm.sh /opt/rocm_sdk_612 60102 gfx906

lamikr · 2025-01-28T09:41:39Z

Does the ./install_debs.sh command work for you because it's debian based distro not ubuntu.
Maybe you should try to install same packages that the install_debs.sh script does for the ubuntu with apt install command?

commandline-be · 2025-01-28T11:46:33Z

Does the ./install_debs.sh command work for you because it's debian based distro not ubuntu. Maybe you should try to install same packages that the install_debs.sh script does for the ubuntu with apt install command?

Thanks for the patience and support. It's not like I've not compiled software before.
It's that I'm doing too many things spread across too many domains at this time.
So I don't spend as much atttention as I should.

After previously clearing the git sourcetree and compiled products I had not re-ran install_deb.sh
I've done so and now and re-ran babs -b which then picked up on compilation again.

Curious, it picked up at 028_ instead of continuing with 039_ where it last failed.

commandline-be · 2025-01-28T20:52:55Z

after running install_deb.sh the build ran far longer, eventually it failed here

cc1plus: all warnings being treated as errors
[46/2664] Building CXX object third_party/fbgemm/asmjit/CMakeFiles/asmjit.dir/src/asmjit/arm/a64assembler.cpp.o
ninja: build stopped: subcommand failed.
build failed: pytorch
  error in build cmd: ./build_rocm.sh /opt/rocm_sdk_612 60102 gfx906

I've now ran the below and await next fail or completion, I assume it makes sense to restart from scratch but I give this a chance.

./babs.sh -up
./babs.sh -b

okay, so that failed too. Now restartig with

./babs.sh -rsb

I assume that makes sense to have clean slate.

commandline-be · 2025-01-29T15:59:08Z

FYI: this is Debian Linuxmint, so, support is great. (see below)

However

./babs -rsb 
./babs -b

Failed again at 039_02_pytorch. I've checked for any updatable python packages to no avail. Some python packages update but in the end the build process still failed the same.
......
/home/user/src/rocm_sdk_builder/builddir/039_02_pytorch
[83] Building: pytorch
[0] pytorch, build command:
cd /home/user/src/rocm_sdk_builder/src_projects/pytorch
[1] pytorch, build command:
./build_rocm.sh /opt/rocm_sdk_612 60102 gfx906
rocm_root_directory parameter: /opt/rocm_sdk_612
Using rocm version parameter: 60102
Building Pytorch for GPUs: gfx906
Building wheel torch-2.4.1
-- Building version 2.4.1
cmake --build . --target install --config Release -- -j 12
[35/2664] Building CXX object third_party/fbgemm/CMakeFiles/fbgemm_avx512.dir/src/UtilsAvx512.cc.o
FAILED: third_party/fbgemm/CMakeFiles/fbgemm_avx512.dir/src/UtilsAvx512.cc.o
/usr/bin/c++ -DFBGEMM_STATIC -I/home/user/src/rocm_sdk_builder/src_projects/pytorch/third_party/cpuinfo/include -I/home/user/src/rocm_sdk_builder/src_projects/pytorch/third_party/fbgemm/third_party/asmjit/src -I/home/user/src/rocm_sdk_builder/src_projects/pytorch/third_party/fbgemm/include -I/home/user/src/rocm_sdk_builder/src_projects/pytorch/third_party/fbgemm -isystem /home/user/src/rocm_sdk_builder/src_projects/pytorch/third_party/protobuf/src -isystem /home/user/src/rocm_sdk_builder/src_projects/pytorch/third_party/XNNPACK/include -Wno-error=maybe-uninitialized -D_GLIBCXX_USE_CXX11_ABI=1 -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -Wall -Wextra -Werror -Wno-deprecated-declarations -Wimplicit-fallthrough -O3 -DNDEBUG -fPIC -fvisibility=hidden -m64 -mavx2 -mfma -mavx512f -mavx512bw -mavx512dq -mavx512vl -std=c++17 -MD -MT third_party/fbgemm/CMakeFiles/fbgemm_avx512.dir/src/UtilsAvx512.cc.o -MF third_party/fbgemm/CMakeFiles/fbgemm_avx512.dir/src/UtilsAvx512.cc.o.d -o third_party/fbgemm/CMakeFiles/fbgemm_avx512.dir/src/UtilsAvx512.cc.o -c /home/user/src/rocm_sdk_builder/src_projects/pytorch/third_party/fbgemm/src/UtilsAvx512.cc
In file included from /usr/lib/gcc/x86_64-linux-gnu/12/include/immintrin.h:49,
from /home/user/src/rocm_sdk_builder/src_projects/pytorch/third_party/fbgemm/src/UtilsAvx512.cc:11:
In function ‘__m512 _mm512_unpacklo_ps(__m512, __m512)’,
inlined from ‘void fbgemm::{anonymous}::transpose_kernel_mxn_avx512(int, const float*, int64_t, float*, int64_t) [with int M = 16]’ at /home/user/src/rocm_sdk_builder/src_projects/pytorch/third_party/fbgemm/src/UtilsAvx512.cc:257:37:
/usr/lib/gcc/x86_64-linux-gnu/12/include/avx512fintrin.h:11541:10: warning: ‘__Y’ may be used uninitialized [-Wmaybe-uninitialized]
11541 | return (__m512) __builtin_ia32_unpcklps512_mask ((__v16sf) __A,
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
11542 | (__v16sf) __B,
| ~~~~~~~~~~~~~~
11543 | (__v16sf)
| ~~~~~~~~~
11544 | _mm512_undefined_ps (),
| ~~~~~~~~~~~~~~~~~~~~~~~
11545 | (__mmask16) -1);
| ~~~~~~~~~~~~~~~
/usr/lib/gcc/x86_64-linux-gnu/12/include/avx512fintrin.h: In function ‘void fbgemm::{anonymous}::transpose_kernel_mxn_avx512(int, const float*, int64_t, float*, int64_t) [with int M = 16]’:
/usr/lib/gcc/x86_64-linux-gnu/12/include/avx512fintrin.h:188:10: note: ‘__Y’ was declared here
188 | __m512 __Y = __Y;
| ^~~

.....

/usr/lib/gcc/x86_64-linux-gnu/12/include/emmintrin.h: In function ‘void fbgemm::requantizeOutputProcessingGConvAvx512(uint8_t*, const int32_t*, const block_type_t&, int, int, const requantizationParams_t<BIAS_TYPE>&) [with bool A_SYMMETRIC = false; bool B_SYMMETRIC = false; QuantizationGranularity Q_GRAN = fbgemm::QuantizationGranularity::OUT_CHANNEL; bool HAS_BIAS = false; bool FUSE_RELU = false; int C_PER_G = 16; BIAS_TYPE = int]’:
/usr/lib/gcc/x86_64-linux-gnu/12/include/emmintrin.h:788:11: note: ‘__Y’ was declared here
788 | __m128i __Y = __Y;
| ^~~
[5513/8176] Building CXX object third_party/fbgemm/CMakeFiles/fbgemm_avx2.dir/src/FbgemmI8DepthwiseAvx2.cc.o
ninja: build stopped: subcommand failed.
build failed: pytorch
error in build cmd: ./build_rocm.sh /opt/rocm_sdk_612 60102 gfx906

FOR INSTALL DEPS

./install_deps.sh 
Supported Linux distribution detected: linuxmint
[sudo] password for user:     
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
Note, selecting 'cmake-qt-gui' instead of 'cmake-gui'
Note, selecting 'libncurses-dev' instead of 'ncurses-dev'
gfortran is already the newest version (4:12.2.0-3).
make is already the newest version (4.3-4.1).
pkg-config is already the newest version (1.8.1-1).
libnuma1 is already the newest version (2.0.16-1).
cmake-curses-gui is already the newest version (3.25.1-1).
dpkg-dev is already the newest version (1.21.22).
rpm is already the newest version (4.18.0+dfsg-1+deb12u1).
doxygen is already the newest version (1.9.4-4).
libelf-dev is already the newest version (0.188-2.1).
rename is already the newest version (2.01-1).
liburi-encode-perl is already the newest version (1.1.1-3).
libfile-basedir-perl is already the newest version (0.09-2).
libfile-copy-recursive-perl is already the newest version (0.45-4).
libfile-listing-perl is already the newest version (6.15-1).
build-essential is already the newest version (12.9).
wget is already the newest version (1.21.3-1+b2).
libomp5 is already the newest version (1:14.0-55.7~deb12u1).
libomp-dev is already the newest version (1:14.0-55.7~deb12u1).
libpci3 is already the newest version (1:3.9.0-4).
libdrm-dev is already the newest version (2.4.114-1+b1).
xxd is already the newest version (2:9.0.1378-2).
libglew-dev is already the newest version (2.2.0-4+b1).
autoconf is already the newest version (2.71-3).
automake is already the newest version (1:1.16.5-1.3).
libtool is already the newest version (2.4.7-7~deb12u1).
libbz2-dev is already the newest version (1.0.8-5+b1).
liblzma-dev is already the newest version (5.4.1-0.2).
libicu-dev is already the newest version (72.1-3).
libfindbin-libs-perl is already the newest version (3.0.2-1).
libmsgpack-dev is already the newest version (4.0.0-3).
python3-pip is already the newest version (23.0.1+dfsg-1).
libssl-dev is already the newest version (3.0.15-1~deb12u1).
python3-openssl is already the newest version (23.0.0-1).
libffi-dev is already the newest version (3.4.4-1).
nlohmann-json3-dev is already the newest version (3.11.2-2).
texinfo is already the newest version (6.8-6+b1).
libnuma-dev is already the newest version (2.0.16-1).
cmake-extras is already the newest version (1.6-1).
cmake-qt-gui is already the newest version (3.25.1-1).
sqlite3 is already the newest version (3.40.1-2+deb12u1).
libsqlite3-dev is already the newest version (3.40.1-2+deb12u1).
git is already the newest version (1:2.39.5-0+deb12u2).
git-lfs is already the newest version (3.3.0-1+deb12u1).
lbzip2 is already the newest version (2.5-2.3).
valgrind is already the newest version (1:3.19.0-1).
libpmix-dev is already the newest version (4.2.2-1+deb12u1).
bison is already the newest version (2:3.8.2+dfsg-1+b1).
flex is already the newest version (2.6.4-8.2).
byacc is already the newest version (1:2.0.20221106-1).
gettext is already the newest version (0.21-12).
ninja-build is already the newest version (1.11.1-2~deb12u1).
texlive is already the newest version (2022.20230122-3).
ocl-icd-opencl-dev is already the newest version (2.3.1-1).
protobuf-compiler is already the newest version (3.21.12-3).
pybind11-dev is already the newest version (2.10.3-1).
libaio-dev is already the newest version (0.3.113-4).
libgmp-dev is already the newest version (2:6.2.1+dfsg1-1.1).
libmpfr-dev is already the newest version (4.2.0-1).
libpng-dev is already the newest version (1.6.39-2).
libjpeg-dev is already the newest version (1:2.1.5-2).
sox is already the newest version (14.4.2+git20190427-3.5).
libncurses-dev is already the newest version (6.4-4).
libsystemd-dev is already the newest version (252.33-1~deb12u1).
libexpat1-dev is already the newest version (2.5.0-1+deb12u1).
libbabeltrace-dev is already the newest version (1.5.11-1+b2).
liblmdb-dev is already the newest version (0.9.24-1).
libopencv-dev is already the newest version (4.6.0+dfsg-12).
libzstd-dev is already the newest version (1.5.4+dfsg2-5).
libva-dev is already the newest version (2.17.0-1).
autopoint is already the newest version (0.21-12).
gawk is already the newest version (1:5.2.1-2).
libcurl4-openssl-dev is already the newest version (7.88.1-10+deb12u8).
cpio is already the newest version (2.13+dfsg-7.1).
ocl-icd-dev is already the newest version (2.3.1-1).
uuid-dev is already the newest version (2.38.1-5+deb12u3).
systemtap-sdt-dev is already the newest version (4.8-2).
libudev-dev is already the newest version (252.33-1~deb12u1).
The following packages were automatically installed and are no longer required:
  clang-tools-15 libhiprtc-builtins5 libhsakmt1 lld-15 mint-backgrounds-wilma
Use 'sudo apt autoremove' to remove them.
0 upgraded, 0 newly installed, 0 to remove and 2 not upgraded.
Requirement already satisfied: CppHeaderParser in /home/user/.local/lib/python3.11/site-packages (2.7.4)
Requirement already satisfied: ply in /usr/lib/python3/dist-packages (from CppHeaderParser) (3.11)
Updated Git hooks.
Git LFS initialized.

Dependencies installed, you can now start using the babs.sh command

commandline-be · 2025-01-29T20:49:30Z

so, iterating over your initial request.

The tests you asked to build all build fine except for the ai_tools.blist one

./babs.sh -b binfo/extra/ai_tools.blist 
....
BUILD_RULE_ROOT_DIR=/home/user/src/rocm_sdk_builder/binfo
_=/usr/bin/env

/home/user/src/rocm_sdk_builder/builddir/vllm
no config
configure ok: vllm

/home/user/src/rocm_sdk_builder/builddir/vllm
[2] Post-configuration: vllm
no post-configuration commands
post-configuration ok: vllm

/home/user/src/rocm_sdk_builder/builddir/vllm
[2] Building: vllm
[0] vllm, build command:
cd /home/user/src/rocm_sdk_builder/src_projects/vllm
[1] vllm, build command:
./build_rocm.sh /opt/rocm_sdk_612 gfx906
using rocm_root_directory specified: /opt/rocm_sdk_612
Using specified amd rocm gpu: gfx906
Traceback (most recent call last):
  File "/home/user/src/rocm_sdk_builder/src_projects/vllm/setup.py", line 12, in <module>
    import torch
ModuleNotFoundError: No module named 'torch'
build failed: vllm
  error in build cmd: ./build_rocm.sh /opt/rocm_sdk_612 gfx906

This seems unrelated to the prior fail of pytorch to build. Correct ?

commandline-be · 2025-01-29T22:23:34Z

Manually building shows some dependencies are not okay.

the broken dependency is not due to packages installed on Debian/Linux Mint.

Such as for

/home/user/src/rocm_sdk_builder/builddir/039_07_triton
[1] Installing: triton
custom install
triton, install command 0
cd /home/user/src/rocm_sdk_builder/src_projects/triton
install cmd ok: triton
triton, install command 1
./install_rocm.sh /home/user/src/rocm_sdk_builder/packages/whl
copying triton python whl file to directory: /home/user/src/rocm_sdk_builder/packages/whl
installing python/dist/triton-3.0.0+git95daa917-cp311-cp311-linux_x86_64.whl
Found existing installation: triton 3.0.0+git95daa917
Uninstalling triton-3.0.0+git95daa917:
  Successfully uninstalled triton-3.0.0+git95daa917
Processing ./python/dist/triton-3.0.0+git95daa917-cp311-cp311-linux_x86_64.whl
Installing collected packages: triton
**ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
torch 2.6.0 requires triton==3.2.0; platform_system == "Linux" and platform_machine == "x86_64", but you have triton 3.0.0+git95daa917 which is incompatible.**
Successfully installed triton-3.0.0+git95daa917
install cmd ok: triton
install ok: triton

ROCM SDK build and install ready
You can use following commands to check your GPU

    source /opt/rocm_sdk_612/bin/env_rocm.sh
    rocminfo

lamikr · 2025-01-30T07:45:48Z

Thanks for the updates. Hopefully the install_deps.sh fixed something even though the pytorch build seems to still fail for you. We need to get that fixed because some other apps depends from it.

I actually added there today just in case also a call for "apt update" before calling the "apt install".
I did that because I noticed that if you installed the ubuntu 24.04 and after that directly cloned the rocm_sdk_builder and run the install_deps.sh command, then that would fail. It would fail because on Ubuntu you need to call the "apt update" at least once to get the list of installable apps in server repositories before you can start installing them.
I have used to always install the vim, openssh-server and geany immediately after the ubuntu install have finished and had not noticed this earlier.

Anyway, I have now installed Linux Mint Debian Edition (LMDE 6) to virtual machine and started the build. It is not finished yet, but hopefully I can reproduce your error.

Steps I have used after install so far.

sudo apt update
sudo apt install git
git clone https://github.com/lamikr/rocm_sdk_builder.git
cd rocm_sdk_builder
./install_deps.sh
git config --global user.name "my name"
git config --global user.email my_id@my_email.com
./babs.sh -b (this will download and patch missing source codes one by one when building, while ./babs.sh -i would download and patch everything in advance)

lamikr · 2025-01-31T01:05:07Z

Ok, I think I can now resolve this. It seems that the GCC 12.20 on LDME6 has some bug or is even more strict for handling the warnings as an error than the gcc13/14 on fedora. I had earlier the

CMAKE_CXX_FLAGS -Wno-error=maybe-uninitialized
but for LDME6, i need to add another flag also:
$CMAKE_CXX_FLAGS -Wno-error=maybe-uninitialized -Wno-error=uninitialized

I think these warnings are fixed in newer fbgemm which is one sublibrary that pytorch uses, but this should be harmless and easiest way to fix this now. I will push the fix soon out.

lamikr · 2025-01-31T01:46:01Z

@commandline-be Fix for pytorch build on LMDE6 is now in. It should work for you if you just do

./babs.sh -up
./babs.sh -b

commandline-be · 2025-01-31T08:43:25Z

@commandline-be Fix for pytorch build on LMDE6 is now in. It should work for you if you just do

./babs.sh -up ./babs.sh -b

Trying that now.

Also, found this likely related pyTorch bug report including a reference to a GCC bug.

pytorch/pytorch#77939

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105593

lamikr · 2025-02-02T00:19:54Z

@commandline-be Did it work for you now?

commandline-be · 2025-02-03T08:38:50Z

@commandline-be Did it work for you now?

After starting over from scratch the build now finished without error.
When I now do./babs.sh -bthe build starts again and quickly finishes with

    ROCM SDK build and install ready
    You can use following commands to check your GPU
    
        source /opt/rocm_sdk_612/bin/env_rocm.sh
        rocminfo

which works as expected

lamikr · 2025-02-03T09:53:38Z

Huh, nice :-) !!!

Is the Radeon VII working, I think you are first one testing it. We have only tested earlier the AMD MI50, which has same chipset.

If you are able to run first some tests.

cd /opt/rocm_sdk_612/docs/examples/hipcc/hello_world
./build.sh

This is my output

rm -f ./hello_world
rm -f hello_world.o
rm -f /opt/rocm_sdk_612/src/*.o
/opt/rocm_sdk_612/bin/hipcc -g -fPIE -c -o hello_world.o hello_world.cpp
/opt/rocm_sdk_612/bin/hipcc hello_world.o -fPIE -o hello_world
./hello_world
System minor: 3
System major: 10
Agent name: AMD Radeon RX 6800
Kernel input: GdkknVnqkc
Expecting that kernel increases each character from input string by one
Kernel output string: HelloWorld
Output string matched with HelloWorld
Test ok!

/opt/rocm_sdk_612/docs/examples/pytorch (use then run button on browser that opens)
./pytorch_gpu_hello_world_jupyter.sh
cd /opt/rocm_sdk_612/docs/examples/pytorch/audio
./pytorch_audio_play_effects.sh
small benchmark
cd /opt/rocm_sdk_612/benchmarks
./run_and_save_benchmarks.sh
bigger benchmark with results saved to new_results folder
git clone https://github.com/lamikr/pytorch-gpu-benchmark
cd pytorch-gpu-benchmark
./run_benchmarks.sh

(results can then be plotted to graph by editing the
_result_filename_arr = [
"results/AMD/AMD_Radeon_RX_6800/AMD_Radeon_RX_6800__float_model_train_benchmark.csv",
"new_results1/0/AMD_Radeon_RX_6800/AMD_Radeon_RX_6800__float_model_train_benchmark.csv",
"new_results/0/AMD_Radeon_RX_6800/AMD_Radeon_RX_6800__float_model_train_benchmark.csv",
]

in plot_benchmarks.py and then running the ./show_benchmark_results.sh

lamikr · 2025-02-03T09:55:09Z

I have myself updated and tested the llama.cpp and vllm with the different deepseek-r1 models.
I put some instructions to docs/examples/llm/llama and vllm folders

commandline-be · 2025-02-03T14:50:00Z

@lamikr thanks, thus far everything I tested works.

I've toyed a bit with the Stable Diffusion webui and that also works fine. Though I doubt if it will ever produce results such as on line with the Radeon VII.

The card seems to be good enough for my current curiosity. This involves translation and text/document analytics for unformatted and formatted text sources including csv, tsv, json etc.

sh build.sh
rm -f ./hello_world
rm -f hello_world.o
rm -f /opt/rocm_sdk_612/src/*.o
/opt/rocm_sdk_612/bin/hipcc -g -fPIE -c -o hello_world.o hello_world.cpp
/opt/rocm_sdk_612/bin/hipcc hello_world.o -fPIE -o hello_world
./hello_world
System minor: 0
System major: 9
Agent name: AMD Radeon VII
Kernel input: GdkknVnqkc
Expecting that kernel increases each character from input string by one
Kernel output string: HelloWorld
Output string matched with HelloWorld
Test ok!

commandline-be · 2025-02-03T16:02:39Z

@lamikr Not overspending on trying to understand I modified the plot_benchmarks.py file to include a commented path for new_results and appended a results path for AMD_RADEON_VII

this does not seem to work after running the benchmark

in this example I've commented the appended line and uncommended where to find the results


def num_sort(test_string):
    return list(map(int, re.findall(r'\d+', test_string)))[0]

#    "results/AMD/AMD_Radeon_780M/AMD_Radeon_780M__float_model_train_benchmark.csv",
#    "results/Nvidia/TITAN_RTX/TITAN_RTX__float_model_train_benchmark.csv",
#    "results/Nvidia/GeForce_GTX_1080_TI/GeForce_GTX_1080_TI__float_model_train_benchmark.csv",
#    "results/Nvidia/GeForce_RTX_2080_TI/GeForce_RTX_2080_TI__float_model_train_benchmark.csv",
#    "results/AMD/AMD_Radeon_780M/AMD_Radeon_780M__float_model_train_benchmark.csv",
#    "results/Nvidia/GeForce_RTX_2080_TI/GeForce_RTX_2080_TI__float_model_train_benchmark.csv",
#    "results/Nvidia/GeForce_RTX_2060_MaxQ/GeForce_RTX_2060_MaxQ__float_model_train_benchmark.csv",

#    "results/AMD/AMD_Radeon_680M/AMD_Radeon_680M__float_model_train_benchmark.csv",
#    "results/AMD/AMD_Radeon_780M/AMD_Radeon_780M__float_model_train_benchmark.csv",
#    "new_results/0/AMD_Radeon_Graphics/AMD_Radeon_Graphics__float_model_train_benchmark.csv",
    "new_results/0/AMD_Radeon_VII/AMD_Radeon_VII_float_model_train_benchmark.csv",


_result_filename_arr = [
    "results/Nvidia/GeForce_GTX_1080_TI/GeForce_GTX_1080_TI__float_model_train_benchmark.csv",
    "results/Nvidia/GeForce_RTX_2080_TI/GeForce_RTX_2080_TI__float_model_train_benchmark.csv",
    "results/Nvidia/GeForce_RTX_3090/GeForce_RTX_3090__float_model_train_benchmark.csv",
    "results/AMD/AMD_Radeon_RX_7700S/AMD_Radeon_RX_7700S__float_model_train_benchmark.csv",
    "results/AMD/AMD_Radeon_RX_7800_XT/AMD_Radeon_RX_7800_XT__float_model_train_benchmark.csv",
    "results/AMD/AMD_Radeon_RX_7900_XTX/AMD_Radeon_RX_7900_XTX__float_model_train_benchmark.csv",
#    "results/AMD/AMD_Radeon_VII/AMD_Radeon_VII_float_model_train_benchmark.csv",
]
#                "results/Nvidia/A100_SXM4_40GB/A100_SXM4_40GB__float_model_train_benchmark.csv",

lamikr · 2025-02-03T18:57:27Z

The lines that are above the _result_filename_arr = [] needs to be in comment. (They are just example lines I could copy/paste inside the result_filename_arr if I want to see them in the graph)

So easiest way for now is that if you move the line

"new_results/0/AMD_Radeon_VII/AMD_Radeon_VII_float_model_train_benchmark.csv",
to replace this line:

# "results/AMD/AMD_Radeon_VII/AMD_Radeon_VII_float_model_train_benchmark.csv",
And then run the ./show_benchmark_results.sh.
(Or if you copy the results folder from new_results directory to resutls/AMD directory, then you could just put the new_results line to comment and then uncoment the results-line)

Radeon VII seems to have lasted well the time or kind of being ahead of it's time when they released it. And now in next year models they are kind of going back to similar instruction set with UDNA cards. I would not be surprised if those would contain a some kind of virtual execution ISA that has similar type of idea what Nvidia has with their PTX. (With having advantage to design it from fresh table while trying to predict what kind of instructions will be needed for game and ai gpus for next 10 years)

commandline-be · 2025-02-04T10:07:40Z

@lamikr I'm amazed with how the RVII is able to still do actual useful work. I remember it was a one of a kind product made to enable for AMD to offer an entry level into compute. If i remember well it is actually an mi25 with on feature missing.

Looking at plot.ipnb should i run pip install --user -r requirements.txt from the same folder since running ./show_benchmark_results.sh does not seem to do much ?

I notice the requirements.txt file contains pytorch and pytorch-vision which I believe to also be in the rocm_sdk_builder repository.

- - ERRORS after running pip install requirements

for the below GPU count is reported as either 0 or 1 with the same outcome,
I tried cloning the repo again to no avail

sh run_benchmarks.sh 
Pytorch-GPU-Benchmark started
GPU count:  1
run_benchmarks.sh: 7: Syntax error: Bad for loop variable

running the benchmark manually i now get

> python3 benchmark_models.py -i 1 -g 1

[2025-02-04 11:29:45,104] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
GPU_ID: 1/1
Traceback (most recent call last):
  File "/opt/rocm_sdk_models/srcs/pytorch-gpu-benchmark/benchmark_models.py", line 226, in <module>
    device_name = str(torch.cuda.get_device_name(gpu_index))
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/rocm_sdk_612/lib/python3.11/site-packages/torch/cuda/__init__.py", line 435, in get_device_name
    return get_device_properties(device).name
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/rocm_sdk_612/lib/python3.11/site-packages/torch/cuda/__init__.py", line 468, in get_device_properties
    raise AssertionError("Invalid device id")
AssertionError: Invalid device id

Said-Akbar mentioned this issue Nov 22, 2024

gfx906 (AMD MI60) is failing on run_and_save_benchmarks.sh and llama.cpp #180

Closed

lamikr closed this as completed in 9839139 Jan 31, 2025

lamikr reopened this Jan 31, 2025

Radeon VII #175

Radeon VII #175

Comments

commandline-be commented Nov 4, 2024

lamikr commented Nov 7, 2024

commandline-be commented Nov 13, 2024

Said-Akbar commented Nov 19, 2024

Said-Akbar commented Nov 20, 2024

Said-Akbar commented Nov 20, 2024 • edited Loading

Said-Akbar commented Nov 20, 2024

Said-Akbar commented Nov 21, 2024 • edited Loading

Said-Akbar commented Nov 22, 2024

Said-Akbar commented Nov 22, 2024

Said-Akbar commented Nov 22, 2024

commandline-be commented Nov 24, 2024

lamikr commented Nov 24, 2024

commandline-be commented Nov 25, 2024

lamikr commented Nov 25, 2024

cb88 commented Dec 12, 2024

lamikr commented Jan 10, 2025

lamikr commented Jan 10, 2025 • edited Loading

Said-Akbar commented Jan 11, 2025

lamikr commented Jan 12, 2025

Said-Akbar commented Jan 12, 2025

Rongronggg9 commented Jan 12, 2025

lamikr commented Jan 13, 2025

Rongronggg9 commented Jan 13, 2025

lamikr commented Jan 13, 2025

Rongronggg9 commented Jan 14, 2025 • edited Loading

commandline-be commented Jan 15, 2025

commandline-be commented Jan 15, 2025 • edited Loading

lamikr commented Jan 15, 2025

lamikr commented Jan 16, 2025 • edited Loading

commandline-be commented Jan 23, 2025

commandline-be commented Jan 23, 2025 • edited Loading

lamikr commented Jan 24, 2025

commandline-be commented Jan 24, 2025

commandline-be commented Jan 24, 2025

commandline-be commented Jan 24, 2025 • edited Loading

lamikr commented Jan 24, 2025

lamikr commented Jan 25, 2025

commandline-be commented Jan 28, 2025

lamikr commented Jan 28, 2025

commandline-be commented Jan 28, 2025

commandline-be commented Jan 28, 2025 • edited Loading

commandline-be commented Jan 29, 2025

commandline-be commented Jan 29, 2025

commandline-be commented Jan 29, 2025

lamikr commented Jan 30, 2025 • edited Loading

lamikr commented Jan 31, 2025

lamikr commented Jan 31, 2025

commandline-be commented Jan 31, 2025

lamikr commented Feb 2, 2025

commandline-be commented Feb 3, 2025

lamikr commented Feb 3, 2025

lamikr commented Feb 3, 2025

commandline-be commented Feb 3, 2025 • edited Loading

commandline-be commented Feb 3, 2025 • edited Loading

lamikr commented Feb 3, 2025 • edited Loading

commandline-be commented Feb 4, 2025 • edited Loading

Said-Akbar commented Nov 20, 2024 •

edited

Loading

Said-Akbar commented Nov 21, 2024 •

edited

Loading

lamikr commented Jan 10, 2025 •

edited

Loading

Rongronggg9 commented Jan 14, 2025 •

edited

Loading

commandline-be commented Jan 15, 2025 •

edited

Loading

lamikr commented Jan 16, 2025 •

edited

Loading

commandline-be commented Jan 23, 2025 •

edited

Loading

commandline-be commented Jan 24, 2025 •

edited

Loading

commandline-be commented Jan 28, 2025 •

edited

Loading

lamikr commented Jan 30, 2025 •

edited

Loading

commandline-be commented Feb 3, 2025 •

edited

Loading

commandline-be commented Feb 3, 2025 •

edited

Loading

lamikr commented Feb 3, 2025 •

edited

Loading

commandline-be commented Feb 4, 2025 •

edited

Loading