-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Radeon VII #175
Comments
Hi, that would be really interesting! Most of the rocm components seems to had by default the support for in code for the Radeon VII/gfx906 in place by default and couple of weeks ago I went through all the common places typically requiring patching. I have tested that everything should now build for these cards but not been able to test functionality with VII. That said, if you have time it would be great if you could try to make the build and test it. These steps should help you to get started.
And once the build has been progressed to building rocminfo and amd-smi, those command would be good way to start checking the build.
Hip and opencl -compiler tests should be also doable pretty soon (no need to wait whole build to finish)
Once the build has finished, if things works well, then also pytorch should have the support for you gpu.
If these works, then you can also build the llama_cpp, stable-diffusion-webui and vllm with command: ./babs.sh -b binfo/extra/ai_tools.blist All of those have also own example apps you can run either on console or by starting their web-server and then connecting to it via browser. (I can help more later if needed) |
Thanks :-) I'll try that asap |
Hello @lamikr , Thank you for your amazing work! I am really glad I found this repo. I have two AMD MI60 cards (gfx906). I will also compile this repo and share test results with you! I am specifically interested in VLLM batch/concurrent inference speeds. So far, I was not able to compile VLLM with default installations of ROCM 6.2.2 and VLLM. There is also composable_kernel based flash attention implementation here - https://github.com/ROCm/flash-attention (v2.6.3). This FA compiles fine with default ROCM 6.2.2 in Ubuntu 22.04 but exllamav2 backend with llama3 8B started generating gibberish text (exllamav2 works fine without FA2; but exllamav2 is very slow without FA2). I hope this repo fixes this gibberish text generation problem with FA2. Thanks again! |
Quick update. I did a fresh installation of Ubuntu 24.04.1 today which takes around 6.5GB SSD storage. It installs Nvidia GPU drivers by default. I assumed this repo would install AMD GPU drivers but no, it did not. Probably, this should be mentioning in README with a brief description of how to install GPU drivers. So, I installed AMD GPU drivers as follows:
Also, there were several packages missing in Ubuntu which I had to install after I saw error messages in ./install_deps.sh.
Only after that, I was able to run ./install_deps.sh without errors. Another feedback. Can you please include a global progress bar that says how many packages were built and the total number of packages remaining in terminal logs? |
ok, I want to report an error that occurred while building the source code.
Attaching the full error output Short info about my PC: I ran the following commands and they worked.
rocminfo correctly showed those two MI60 cards. hipcc and opencl examples worked without errors. Please, let me know if I need to install Cuda libraries or else, how I fix the error above. Thanks! |
@lamikr , I think the error I am seeing might be related to spack/spack#45411 but not sure how I implement the fix here. Let me know. thanks! |
Quick update. Installation is working after I remove all nvidia drivers and restart my PC.
Now, Ubuntu is using X.Org Server Nouveau drivers. |
Finally, ROCM SDK was installed on my PC after 5 hours. It takes ~90GB of space in rocm_sdk_builder, 8.5GB in the triton folder, ~2GB in the lib/x86_64-linux-gnu folder (mostly LLVM) and ~20GB in opt/rocm_sdk_612 folder. Total of 120GB of files! Is there a way to create an installable version of my current setup (all 120GB)? It is huge and time-consuming. For comparison, rocm installation from binaries takes around 30GB. |
here are the benchmark results. I think the flash attention test failed.
|
that error above is causing llama.cpp not to run any models on GPU. Let me file a bug. |
@lamikr finally got round to do the testing initially the build went smooth-ish after doing a HIP_COMPILER=clang |
Hi. thanks for the reports. The flash attention support for gfx906 would need to be implemented in aotriton. Althought I do not have the gfx906, I will start a new build for it with ubuntu 24.04 and try to reproduce the build errors. If you have some fixes, are you able to make pull request? |
hey @lamikr The build is on LinuxMint Debian Edition, if need be i can make pull requests |
I have multiple versions of it under src_projects directory
I am not sure what is causing it. Maybe the install directory /opt/rocm_sdk_612 should also be removed and then start a clean build. Lets try to reset everything and then start a fresh build.
I have not solved yet the llama.cpp error with gfx906 but trying to add more debug to next build related to that. |
I can get as far as running the HIP and CL hello worlds, but cannot run the run and save benchmarks script. -- MIGraphX is using hipRTC
in its INTERFACE_INCLUDE_DIRECTORIES. Possible reasons include:
CMake Error in src/py/CMakeLists.txt:
in its INTERFACE_INCLUDE_DIRECTORIES. Possible reasons include:
CMake Warning (dev) at /opt/rocm_sdk_612/share/rocmcmakebuildtools/cmake/ROCMTest.cmake:230 (install): RPATH entries for target 'test_verify' will not be escaped in the -- Generating done (2.5s) |
I have now done couple of docker images from the quite new rocm_sdk_builder for different gpu-architectures.
Docker image for cdna-cards supports gfx906 and works at least with MI50, so I believe it could also work with Vega VII. The problem is that even with xz-compressed the size of these images is 6GB. (and as uncompessed it is about 50gb) These images have every application included from rocm sdk builder. Do you know is it possible to upload xz compressed docker images to docker hub? |
@Said-Akbar, @cb88 and @commandline-be Here are the commands I used to import and run it: (tmpdir was needed because image is so big that to import would fail if reqular /tmp memory dir is used)
Now I would need some location where to upload it. |
Hello @lamikr , I checked docker hub and they have a limit of 2GB. Since it is not enough for this container, I can suggest that you create a compressed file and upload it to a google drive. Each new gmail account will have 15GB of free storage. Then you can make the folder public and share with us. Thanks! |
Hi @Said-Akbar, I did what you suggested :-) https://drive.google.com/drive/folders/1XnoSvL41XhrKT_5NbBSrUZ_1LaVpQ-xb Let me know how it work. I put there also a file with some instructions as at least I needed to change the TMPDIR location during import time to avoid running out of memory. |
Thank you! I will try it tomorrow. |
GitHub offers a free docker registry for public repos. The size limit is 10GiB per layer. |
Do you have experience for doing docker builds on github and what does the layer mean here? I would think that the end-image size is at least 10-50 gb uncompressed and as compressed it would be about 2-6 gb. At the moment I am doing the build by using files in folder
I have splitted there the Dockerfile to multiple parts because I had some kind of image merge error in the end, when I just tried to build too many projects in one step. At the moment I will first put everything in comment on Dockerfile after line `
and then I will call for example a script:
Once that step finishes and image is saved, I uncomment the line asking to build the second set: and then launch the ./build_rocm_sdk_container_rdna3.sh script again. Maybe there is some better/smarter way of doing the commits between RUN commands that I am missing. Something similar than calling "podman tag " directly from the Dockerfile between RUN commands. Or alternatively there could maybe be a script that adds those "RUN commands" dynamically to Dockerfile after each succesful build phase. That would be nice to try on github. |
Most (if not all) Dockerfile instructions result in a new layer to be created. You may examine layers of existing images using
I guess the size limit is applied to the compressed size of each layer, but I am not very sure.
That's a bad practice even if it succeeds in merging since a massive image layer is very hard to upload/download. Splitting build steps into multiple
Use the layer id (hash) in FROM 1234567890ab
# ... remaining steps ... and build it.
For example:
FROM ubuntu:24.04
# ... omitted ...
WORKDIR /
RUN apt update && apt install -y git git-lfs sudo vim
RUN git clone --recursive https://github.com/lamikr/rocm_sdk_builder.git
WORKDIR /rocm_sdk_builder
RUN git checkout master
RUN ./docs/notes/containers/scripts/preconfig_ubuntu.sh
RUN ./docs/notes/containers/scripts/select_target_gpus.sh ${ENV_VAR__TARGET_GPU_CFG_FILE_ALL}
I noticed that your Dockerfile does some cleanup jobs in the final layers. It is useless as all deleted files will remain in the previous layers (each layer, once created, is immutable). |
Thanks for the suggestions... So basically you suggested that the Dockerfile itself has only a commands for creating the very base image and then all other "docker run" commands following are called from the shell script instead of adding them to Dockerfile itself. I did not realize that it can be done that way. I had thought that I need to modify the Dockerfile itself dynamically from the script between each build step by using "echo "MY command" >> Dockerfile". Error-check needs to be anyway added between each step so that the script will stop, if some of the steps failed. Reason for the cleanup task in the end is that in that way the exported image created with "podman export" will be smaller. That one will contain only the files that are in the image on final step. The image that is now shared in gmail fileshare is done in that way. In that way I was able to reduce the size of the exported image for tens of gigs. User of the docker image can get these files anyway back if he want by using babs.sh commands. It even allows updating and rebuilding it partially with commands like:
|
The FROM ubuntu:24.04 AS builder
# ... all build steps ...
FROM scratch
COPY --from=builder / /
# Now you get a single-layer image That being said, the best practice to build a Docker image usually follows these practices:
If you'd like to setup GitHub Actions to build Docker images, there is a limit that "each job in a workflow can run for up to 6 hours of execution time". To work around the limit, splitting the Dockerfile is needed anyway in order that a build workflow can be split into multiple jobs - several jobs build some groups of dependencies, and they are finally aggregated in jobs that build the last monsters (pytorch, etc). This strategy has some other advantages, e.g. if two or more components do not depend on each other, they can be built in different jobs (thus on different GHA runners) simultaneously, cutting the total build time down greatly. The current build process of the project follows a linear dependency chain (components are built one by one). Could you make a (rough) dependency graph among all components built by this project? Using such information, building some components simultaneously would be possible and I am willing to help to write proper Dockerfile and GHA workflows. |
Thanks for this guide. Sadly I've not managed to push beyond the 'build failed: roctracer' situation. This fails on 'hipGetDevicePropertiesR0600' with an undefined reference in MatrixTranspose_test.cpp:(.text+0x322) This fails on 'hipGetDevicePropertiesR0600' with an undefined reference in MatrixTranspose_test.cpp:(.text+0x360) I'm building this on Debian which is not throwing any compatibility issue afaik but does fail to build. It seems also the build script is not aligned with the actual code tree. in that i do not find env_rocm.sh in /opt/rocm/bin for example but i do find it under ./binfo/env etc. Please assist
|
After running babs -rs The result was the same, the file env_rocm.sh was not found in /opt/rocm/bin before or after |
@commandline-be It should be by default in
(not in /opt/rocm/bin/env_rocm.sh, as /opt/rocm folder is usually used by the AMD's own rocm builds) Can you check whether you have /opt/rocm_sdk_612/bin/env_rocm.sh? If yes, then you should also find some example apps to test on. For example:
If not, let's try to do with smaller steps to find out what is the problem. The env-variable script should be installed alredy by the first package, so we can try to build only that one.
After these commands you should have that script installed and only a couple of other files. |
@Rongronggg9 Thanks for the feedback. I will try to do the Dockerfile and github-actions for github build now... I was thinking something like this for the base-image that would be run on first command. It should stay under time and space limit for single layer.
And then for github-actions, I could first try with something like this to build the base image first with Dockerfile and then run single action to build llvm with a separate command after that to create a second layer.
|
Progress it seems ... [ 0%] Building C object external/llvm-project/llvm/lib/Support/BLAKE3/CMakeFiles/LLVMSupportBlake3.dir/blake3.c.o and eventually [ 0%] Built target MLIRTableGen |
After: apt install miopen-hip-dev and then babs -b again I now get Dependencies file "external/llvm-project/llvm/lib/Support/CMakeFiles/LLVMSupport.dir/circular_raw_ostream.cpp.o.d" is newer than depends file "/home/user/src/rocm_sdk_builder/builddir/031_01_miopen_rocMLIR/external/llvm-project/llvm/lib/Support/CMakeFiles/LLVMSupport.dir/compiler_depend.internal". and also after: apt install llvm-dev make[2]: Nothing to be done for 'external/llvm-project/llvm/tools/mlir/lib/TableGen/CMakeFiles/MLIRTableGen.dir/build'. |
You should not need to install the rocm deb files, those are most likely messing the build somehow. That's because I see in AmdDeviceLibsIncGen.py file that was in your error message a following on line 25: def generate(outputPath: Path, rocmPath: Path, libs: List[str]) -> None: Can you print the output of following commands:
|
Thanks for the informative feedback.
should I consider running a |
I've now removed anything related to rocm installed on the OS by the package manager. The build now appears to continue. Thanks. I assumed the build process would not ingest anything from the OS for building. |
After a long build it now fails at pytorch. I report the output below, should this matter. I'm restarting the entire build after a clean (-rs) cc1plus: all warnings being treated as errors --- before this FAILED is reported 41/2619] Building CXX object third_party/fbgemm/CMakeFiles/fbgemm_avx512.dir/src/UtilsAvx512.cc.o |
Thanks, this was good to information. Hard to know what exactly caused that. Maybe there were some libraries or header files under /usr directory that confused it. I wished it would have been some environment variable like ROCM_HOME because that I could have been able to solve/fix more easily by re-dedining it in envsetup.sh that babs uses. Any changes you could check which deb files you removed? (history command) |
@commandline-be If you are still seeing the same error, could you try to replace the line "unset CFLAGS"
from src_projects/pytorch/build_rocm.sh and then run
to see whether pytorch would now build ok. |
@lamikr did just that and the build also fails Though to me the build works much beter with the APT removed, here's a list. The commands were required because just using APT failed on dependency conflict. dpkg -P rocm-core comgr hip-runtime-amd hipsparse hipsparse-dev hiptensor miopen-hip openmp-extras-runtime comgr hsa-rocr hsakmt-roct-dev rocblas rocm-language-runtime rocm-llvm rocm-ocl-icd rocm-opencl rocm-opencl-runtime rocminfo rocm-device-libs rocsparse hsa-rocr-dev hsa-rocr miopen-hip-dev hiptensor-dev rocblas-dev hipcc rocsparse-dev rocm-core rocm-device-libs dpkg -P libamd-comgr-dev libamd-comgr2 libhsa-runtime-dev libhsa-runtime64-1 libamdhip64-5 libamd-comgr-dev libamdhip64-dev Below the most meaningful parts for the build failing, could be an outdated gcc lib ? /usr/lib/gcc/x86_64-linux-gnu/12/include/avx512fintrin.h: In function ‘void fbgemm::internal::SparseDenseInt8MMAvx512(int, const std::unique_ptr<fbgemm::BCSRMatrix<> >&, const uint8_t*, int, int32_t*, uint8_t*, int, fbgemm::trRequantizationParams_t&, bool, int, int) [with bool FUSE_RELU = false; fbgemm::QuantizationGranularity Q_GRAN = fbgemm::QuantizationGranularity::OUT_CHANNEL]’: [5513/8176] Building CXX object third_party/fbgemm/CMakeFiles/fbgemm_avx2.dir/src/FbgemmI8DepthwiseAvx2.cc.o |
Does the ./install_debs.sh command work for you because it's debian based distro not ubuntu. |
Thanks for the patience and support. It's not like I've not compiled software before. After previously clearing the git sourcetree and compiled products I had not re-ran Curious, it picked up at 028_ instead of continuing with 039_ where it last failed. |
after running
I've now ran the below and await next fail or completion, I assume it makes sense to restart from scratch but I give this a chance.
okay, so that failed too. Now restartig with
I assume that makes sense to have clean slate. |
FYI: this is Debian Linuxmint, so, support is great. (see below) However
Failed again at 039_02_pytorch. I've checked for any updatable python packages to no avail. Some python packages update but in the end the build process still failed the same. ..... /usr/lib/gcc/x86_64-linux-gnu/12/include/emmintrin.h: In function ‘void fbgemm::requantizeOutputProcessingGConvAvx512(uint8_t*, const int32_t*, const block_type_t&, int, int, const requantizationParams_t<BIAS_TYPE>&) [with bool A_SYMMETRIC = false; bool B_SYMMETRIC = false; QuantizationGranularity Q_GRAN = fbgemm::QuantizationGranularity::OUT_CHANNEL; bool HAS_BIAS = false; bool FUSE_RELU = false; int C_PER_G = 16; BIAS_TYPE = int]’: FOR INSTALL DEPS
|
so, iterating over your initial request. The tests you asked to build all build fine except for the
This seems unrelated to the prior fail of pytorch to build. Correct ? |
Manually building shows some dependencies are not okay. the broken dependency is not due to packages installed on Debian/Linux Mint.
|
Thanks for the updates. Hopefully the install_deps.sh fixed something even though the pytorch build seems to still fail for you. We need to get that fixed because some other apps depends from it. I actually added there today just in case also a call for "apt update" before calling the "apt install". Anyway, I have now installed Linux Mint Debian Edition (LMDE 6) to virtual machine and started the build. It is not finished yet, but hopefully I can reproduce your error. Steps I have used after install so far.
|
Ok, I think I can now resolve this. It seems that the GCC 12.20 on LDME6 has some bug or is even more strict for handling the warnings as an error than the gcc13/14 on fedora. I had earlier the CMAKE_CXX_FLAGS -Wno-error=maybe-uninitialized I think these warnings are fixed in newer fbgemm which is one sublibrary that pytorch uses, but this should be harmless and easiest way to fix this now. I will push the fix soon out. |
@commandline-be Fix for pytorch build on LMDE6 is now in. It should work for you if you just do ./babs.sh -up |
Trying that now. Also, found this likely related pyTorch bug report including a reference to a GCC bug. |
@commandline-be Did it work for you now? |
After starting over from scratch the build now finished without error.
which works as expected |
Huh, nice :-) !!! Is the Radeon VII working, I think you are first one testing it. We have only tested earlier the AMD MI50, which has same chipset. If you are able to run first some tests.
rm -f ./hello_world
(results can then be plotted to graph by editing the in plot_benchmarks.py and then running the ./show_benchmark_results.sh |
I have myself updated and tested the llama.cpp and vllm with the different deepseek-r1 models. |
@lamikr thanks, thus far everything I tested works. I've toyed a bit with the Stable Diffusion webui and that also works fine. Though I doubt if it will ever produce results such as on line with the Radeon VII. The card seems to be good enough for my current curiosity. This involves translation and text/document analytics for unformatted and formatted text sources including csv, tsv, json etc. sh build.sh |
@lamikr Not overspending on trying to understand I modified the plot_benchmarks.py file to include a commented path for new_results and appended a results path for AMD_RADEON_VII this does not seem to work after running the benchmark in this example I've commented the appended line and uncommended where to find the results
|
The lines that are above the _result_filename_arr = [] needs to be in comment. (They are just example lines I could copy/paste inside the result_filename_arr if I want to see them in the graph) So easiest way for now is that if you move the line
Radeon VII seems to have lasted well the time or kind of being ahead of it's time when they released it. And now in next year models they are kind of going back to similar instruction set with UDNA cards. I would not be surprised if those would contain a some kind of virtual execution ISA that has similar type of idea what Nvidia has with their PTX. (With having advantage to design it from fresh table while trying to predict what kind of instructions will be needed for game and ai gpus for next 10 years) |
@lamikr I'm amazed with how the RVII is able to still do actual useful work. I remember it was a one of a kind product made to enable for AMD to offer an entry level into compute. If i remember well it is actually an mi25 with on feature missing. Looking at I notice the requirements.txt file contains pytorch and pytorch-vision which I believe to also be in the rocm_sdk_builder repository.
for the below GPU count is reported as either 0 or 1 with the same outcome,
running the benchmark manually i now get
|
owner of a Radeon VII card, if i can help testing code to run well on it, let me know
The text was updated successfully, but these errors were encountered: