Llama.cpp fails on Fedora AMD - ROCm error #732

vpavlin · 2025-02-04T11:26:52Z

Hi folks:-)

I am not saying this is a Ramalama issue, but I would appreciate your help/guidance, because thi is my first endeavour with local GPUs:-)

I just got my GMKtec K11 machine (https://www.gmktec.com/products/amd-ryzen%E2%84%A2-9-8945hs-nucbox-k11) and installed Fedora 41 on it + podman + ramalama (Installed via the curl .. | sh method from README)

$ ramalama --version
ramalama version 0

This is a result of ramalama run

vpavlin@localhost:~$ ramalama run llama3.2
> HI                                                                                                                                                                                                                                                                                                                                                                  
ggml_cuda_compute_forward: RMS_NORM failed
ROCm error: invalid device function
  current device: 0, in function ggml_cuda_compute_forward at /llama.cpp/ggml/src/ggml-cuda/ggml-cuda.cu:2313
  err
/llama.cpp/ggml/src/ggml-cuda/ggml-cuda.cu:72: ROCm error
Memory critical error by agent node-0 (Agent handle: 0x10a3dc10) on address 0x7f3c19800000. Reason: Memory in use.

It successfully finds the GPU (very cool)

...
load_tensors: offloading 28 repeating layers to GPU
load_tensors: offloading output layer to GPU
load_tensors: offloaded 29/29 layers to GPU
load_tensors:   CPU_Mapped model buffer size =   308.23 MiB
load_tensors:        ROCm0 model buffer size =  1918.35 MiB
...

but tehn fails when I try to prompt the model

$ podman images
REPOSITORY               TAG         IMAGE ID      CREATED       SIZE
quay.io/ramalama/rocm    latest      8875feffdb87  16 hours ago  6.92 GB
docker.io/ollama/ollama  latest      f1fd985cee59  2 weeks ago   3.31 GB

Any ideas/thoughts are appreciated:) Happy to file an issue against llama.cpp, just wanted to make sure I am not missing something obvious (like packages installed or something)

The text was updated successfully, but these errors were encountered:

ericcurtin · 2025-02-04T11:35:12Z

I think that GPU is gfx1103 . Can you check if the relevant file in in the container in /opt ? (It should have gfx1103 in the filename)

ericcurtin · 2025-02-04T11:36:33Z

You could just be simply running out of VRAM, how much VRAM does your GPU have?

ericcurtin · 2025-02-04T11:37:09Z

If:

llama3.2:1b

works, you are likely running out of VRAM, I think by default it's 3b.

vpavlin · 2025-02-04T11:38:52Z

ramalama --debug run llama3.2
run_cmd:  podman inspect quay.io/ramalama/rocm:0
Working directory: None
Ignore stderr: False
Ignore all: True
exec_cmd:  podman run --rm -i --label RAMALAMA --security-opt=label=disable --name ramalama_Hpxh57JSxc --pull=newer -t --device /dev/dri --device /dev/kfd -e HIP_VISIBLE_DEVICES=0 --mount=type=bind,src=/home/vpavlin/.local/share/ramalama/models/ollama/llama3.2:latest,destination=/mnt/models/model.file,ro quay.io/ramalama/rocm:latest llama-run -c 2048 --temp 0.8 -v --ngl 999 /mnt/models/model.file
Loading modelggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no                                                                                                                                                                                                                                                                                                               
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
  Device 0: AMD Radeon Graphics, gfx1103 (0x1103), VMM: no, Wave Size: 32
llama_model_load_from_file_impl: using device ROCm0 (AMD Radeon Graphics) - 14364 MiB free

It's an iGPU and seems like it gets 50% of teh RAM

I am pulling qwen2.5:1.5b to check a smaller model, but my internet sucks, so it is gonna take a few more minutes....:D

vpavlin · 2025-02-04T11:44:29Z

qwen still fails with the same error

Not sure if these are the files you were looking for?

[root@45b541dd4f49 /]# find /opt -iname "*gfx1103*"
/opt/rocm-6.3.1/lib/llvm/lib/libdevice/libhostexec-gfx1103.bc
/opt/rocm-6.3.1/lib/llvm/lib/libomptarget-amdgpu-gfx1103.bc
/opt/rocm-6.3.1/lib/llvm/lib-debug/libdevice/libhostexec-gfx1103.bc
/opt/rocm-6.3.1/lib/llvm/lib-debug/libomptarget-amdgpu-gfx1103.bc

vpavlin · 2025-02-04T12:08:20Z

I also noticed this

  --gpu                 offload the workload to the GPU (default: False)

and since I am not specifying --gpu it should only use CPU by default, no? How do I turn off GPU offloading?

vpavlin · 2025-02-04T14:20:00Z

There is something generally weird happening - I checked BIOS and there was actually only 3GB VRAM assigned, so I bumped it up to 16GB (out of 32GB RAM total in the machine)

I can see the VRAM availeble now

But it seems both ollama/ollama:rocm and ramalama results in GTT being used (which is now only 8GB rather than the number I reported above - 14364 MiB free

Again, this is my first experience with GPUs in general, so it is all very confusing - feel free to send me somewhere else:D

vpavlin · 2025-02-04T14:38:42Z

Curious if the memory error might be related to this: ollama/ollama#5471 (comment)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Llama.cpp fails on Fedora AMD - ROCm error #732

Llama.cpp fails on Fedora AMD - ROCm error #732

vpavlin commented Feb 4, 2025

ericcurtin commented Feb 4, 2025

ericcurtin commented Feb 4, 2025

ericcurtin commented Feb 4, 2025

vpavlin commented Feb 4, 2025

vpavlin commented Feb 4, 2025

vpavlin commented Feb 4, 2025

vpavlin commented Feb 4, 2025 •

edited

Loading

vpavlin commented Feb 4, 2025

Llama.cpp fails on Fedora AMD - ROCm error #732

Llama.cpp fails on Fedora AMD - ROCm error #732

Comments

vpavlin commented Feb 4, 2025

ericcurtin commented Feb 4, 2025

ericcurtin commented Feb 4, 2025

ericcurtin commented Feb 4, 2025

vpavlin commented Feb 4, 2025

vpavlin commented Feb 4, 2025

vpavlin commented Feb 4, 2025

vpavlin commented Feb 4, 2025 • edited Loading

vpavlin commented Feb 4, 2025

vpavlin commented Feb 4, 2025 •

edited

Loading