-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fedora 41: CUDA version mismatch (12.8.0 vs 12.7): CUDA error: the provided PTX was compiled with an unsupported toolchain. #759
Comments
@rhatdan @bmahabirbu have reported this before. I would suggest trying to push as much userspace into the container as possible. Maybe even avoid using nvidia container toolkit. Theoretically the only part of the stack that we need to run outside of the container is the kernel modules, which shouldn't suffer from version mismatch. But I don't know if we can do this in practice, so it's just an idea. |
Thanks for the detailed analysis and showing it's CUDA 12.7 vs 12.8 on the guest and host |
The workaround is to use older cuda image:
Let me explain what's going on:
To run this program, save it to a file called $ python hello_world.py
Hello, World! This should output "Hello, World!" to the console. Alternative Version: Using a Single StatementIf you prefer a more concise version, you can use a single statement: # hello_world.py
print("Hello, World!") This will produce the same output as the previous example.
|
Based on above experiment it looks that it might work as long as the following is preserved:
|
That is going to be kind of hard to maintain. I believe you can do This seems to work on my laptop without loss of speed. |
@rhatdan turns out that ngl 0 indeed doesn't use gpu for me. Could be your laptop is just as fast on cpu! Also using ngl 0 doesnt invoke the cuda driver so that's why it was working even though there was a cuda mismatch. Thanks @dwrobel for documenting the issue! Another thing to note. Following the documentation in ramalama for setting up cuda you have to do the steps over for a driver update |
I would be logging these issues against places like: https://github.com/NVIDIA/nvidia-container-toolkit they might have advice how to make these things more portable |
Sometimes I wish CUDA was more like ROCm, ROCm is just a couple of userspace components in the container, poke a few holes in the container and you are done. |
Cuda 12.8 is available as rpms here for host installation (or whatever is required on the host, somebody should explore what's the minimum required on the host, the less we install there the better), maybe this is a documentation exercise also: https://developer.download.nvidia.com/compute/cuda/repos/fedora41/x86_64/ Or a scripting exercise... |
Or we default to using vulkan everywhere and allow users to substitute a cuda container only if they override. |
There is also a description how to install it on Fedora on |
Yeah but I suspect that guide will leave you with old cuda and broken with the latest RamaLama CUDA container (I don't know for sure). I like Vulkan, but it's hard to ignore CUDA since it's what everyone wants to use in enterprise. And I suspect Vulkan will have a different set of issues. |
@dwrobel try running with quay.io/ramalama/vulkan |
https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/cdi-support.html And then following this https://github.com/cncf-tags/container-device-interface#how-to-configure-cdi According to these docs there could be a way to setup the nvidia runtime inside the container without the nvidia container toolkit |
It works and it offloads to the Intel GPU. $ ramalama --gpu --debug --image quay.io/ramalama/vulkan run llama3.2 "Write a hello world program in python"
exec_cmd: podman run --rm -i --label RAMALAMA --security-opt=label=disable --name ramalama_6bTCwk0VsE --pull=newer -t --device /dev/dri --device nvidia.com/gpu=all -e CUDA_VISIBLE_DEVICES=0 --mount=type=bind,src=/home/dw/.local/share/ramalama/models/ollama/llama3.2:latest,destination=/mnt/models/model.file,ro quay.io/ramalama/vulkan llama-run -c 2048 --temp 0.8 -v --ngl 999 /mnt/models/model.file Write a hello world program in python
Loading modelggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = Intel(R) Graphics (RPL-S) (Intel open-source Mesa driver) | uma: 1 | fp16: 1 | warp size: 32 | matrix cores: none
llama_model_load_from_file_impl: using device Vulkan0 (Intel(R) Graphics (RPL-S)) - 31975 MiB free
llama_model_loader: loaded meta data with 30 key-value pairs and 255 tensors from /mnt/models/model.file (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = llama
llama_model_loader: - kv 1: general.type str = model
llama_model_loader: - kv 2: general.name str = Llama 3.2 3B Instruct
llama_model_loader: - kv 3: general.finetune str = Instruct
llama_model_loader: - kv 4: general.basename str = Llama-3.2
llama_model_loader: - kv 5: general.size_label str = 3B
llama_model_loader: - kv 6: general.tags arr[str,6] = ["facebook", "meta", "pytorch", "llam...
llama_model_loader: - kv 7: general.languages arr[str,8] = ["en", "de", "fr", "it", "pt", "hi", ...
llama_model_loader: - kv 8: llama.block_count u32 = 28
llama_model_loader: - kv 9: llama.context_length u32 = 131072
llama_model_loader: - kv 10: llama.embedding_length u32 = 3072
llama_model_loader: - kv 11: llama.feed_forward_length u32 = 8192
llama_model_loader: - kv 12: llama.attention.head_count u32 = 24
llama_model_loader: - kv 13: llama.attention.head_count_kv u32 = 8
llama_model_loader: - kv 14: llama.rope.freq_base f32 = 500000.000000
llama_model_loader: - kv 15: llama.attention.layer_norm_rms_epsilon f32 = 0.000010
llama_model_loader: - kv 16: llama.attention.key_length u32 = 128
llama_model_loader: - kv 17: llama.attention.value_length u32 = 128
llama_model_loader: - kv 18: general.file_type u32 = 15
llama_model_loader: - kv 19: llama.vocab_size u32 = 128256
llama_model_loader: - kv 20: llama.rope.dimension_count u32 = 128
llama_model_loader: - kv 21: tokenizer.ggml.model str = gpt2
llama_model_loader: - kv 22: tokenizer.ggml.pre str = llama-bpe
llama_model_loader: - kv 23: tokenizer.ggml.tokens arr[str,128256] = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv 24: tokenizer.ggml.token_type arr[i32,128256] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv 25: tokenizer.ggml.merges arr[str,280147] = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "...
llama_model_loader: - kv 26: tokenizer.ggml.bos_token_id u32 = 128000
llama_model_loader: - kv 27: tokenizer.ggml.eos_token_id u32 = 128009
llama_model_loader: - kv 28: tokenizer.chat_template str = {{- bos_token }}\n{%- if custom_tools ...
llama_model_loader: - kv 29: general.quantization_version u32 = 2
llama_model_loader: - type f32: 58 tensors
llama_model_loader: - type q4_K: 168 tensors
llama_model_loader: - type q6_K: 29 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type = Q4_K - Medium
print_info: file size = 1.87 GiB (5.01 BPW)
init_tokenizer: initializing tokenizer for type 2
load: control token: 128254 '<|reserved_special_token_246|>' is not marked as EOG
load: control token: 128249 '<|reserved_special_token_241|>' is not marked as EOG
load: control token: 128246 '<|reserved_special_token_238|>' is not marked as EOG
load: control token: 128243 '<|reserved_special_token_235|>' is not marked as EOG
load: control token: 128242 '<|reserved_special_token_234|>' is not marked as EOG
load: control token: 128241 '<|reserved_special_token_233|>' is not marked as EOG
load: control token: 128240 '<|reserved_special_token_232|>' is not marked as EOG
load: control token: 128235 '<|reserved_special_token_227|>' is not marked as EOG
load: control token: 128231 '<|reserved_special_token_223|>' is not marked as EOG
load: control token: 128230 '<|reserved_special_token_222|>' is not marked as EOG
load: control token: 128228 '<|reserved_special_token_220|>' is not marked as EOG
load: control token: 128225 '<|reserved_special_token_217|>' is not marked as EOG
load: control token: 128218 '<|reserved_special_token_210|>' is not marked as EOG
load: control token: 128214 '<|reserved_special_token_206|>' is not marked as EOG
load: control token: 128213 '<|reserved_special_token_205|>' is not marked as EOG
load: control token: 128207 '<|reserved_special_token_199|>' is not marked as EOG
load: control token: 128206 '<|reserved_special_token_198|>' is not marked as EOG
load: control token: 128204 '<|reserved_special_token_196|>' is not marked as EOG
load: control token: 128200 '<|reserved_special_token_192|>' is not marked as EOG
load: control token: 128199 '<|reserved_special_token_191|>' is not marked as EOG
load: control token: 128198 '<|reserved_special_token_190|>' is not marked as EOG
load: control token: 128196 '<|reserved_special_token_188|>' is not marked as EOG
load: control token: 128194 '<|reserved_special_token_186|>' is not marked as EOG
load: control token: 128193 '<|reserved_special_token_185|>' is not marked as EOG
load: control token: 128188 '<|reserved_special_token_180|>' is not marked as EOG
load: control token: 128187 '<|reserved_special_token_179|>' is not marked as EOG
load: control token: 128185 '<|reserved_special_token_177|>' is not marked as EOG
load: control token: 128184 '<|reserved_special_token_176|>' is not marked as EOG
load: control token: 128180 '<|reserved_special_token_172|>' is not marked as EOG
load: control token: 128179 '<|reserved_special_token_171|>' is not marked as EOG
load: control token: 128178 '<|reserved_special_token_170|>' is not marked as EOG
load: control token: 128177 '<|reserved_special_token_169|>' is not marked as EOG
load: control token: 128176 '<|reserved_special_token_168|>' is not marked as EOG
load: control token: 128175 '<|reserved_special_token_167|>' is not marked as EOG
load: control token: 128171 '<|reserved_special_token_163|>' is not marked as EOG
load: control token: 128170 '<|reserved_special_token_162|>' is not marked as EOG
load: control token: 128169 '<|reserved_special_token_161|>' is not marked as EOG
load: control token: 128168 '<|reserved_special_token_160|>' is not marked as EOG
load: control token: 128165 '<|reserved_special_token_157|>' is not marked as EOG
load: control token: 128162 '<|reserved_special_token_154|>' is not marked as EOG
load: control token: 128158 '<|reserved_special_token_150|>' is not marked as EOG
load: control token: 128156 '<|reserved_special_token_148|>' is not marked as EOG
load: control token: 128155 '<|reserved_special_token_147|>' is not marked as EOG
load: control token: 128154 '<|reserved_special_token_146|>' is not marked as EOG
load: control token: 128151 '<|reserved_special_token_143|>' is not marked as EOG
load: control token: 128149 '<|reserved_special_token_141|>' is not marked as EOG
load: control token: 128147 '<|reserved_special_token_139|>' is not marked as EOG
load: control token: 128146 '<|reserved_special_token_138|>' is not marked as EOG
load: control token: 128144 '<|reserved_special_token_136|>' is not marked as EOG
load: control token: 128142 '<|reserved_special_token_134|>' is not marked as EOG
load: control token: 128141 '<|reserved_special_token_133|>' is not marked as EOG
load: control token: 128138 '<|reserved_special_token_130|>' is not marked as EOG
load: control token: 128136 '<|reserved_special_token_128|>' is not marked as EOG
load: control token: 128135 '<|reserved_special_token_127|>' is not marked as EOG
load: control token: 128134 '<|reserved_special_token_126|>' is not marked as EOG
load: control token: 128133 '<|reserved_special_token_125|>' is not marked as EOG
load: control token: 128131 '<|reserved_special_token_123|>' is not marked as EOG
load: control token: 128128 '<|reserved_special_token_120|>' is not marked as EOG
load: control token: 128124 '<|reserved_special_token_116|>' is not marked as EOG
load: control token: 128123 '<|reserved_special_token_115|>' is not marked as EOG
load: control token: 128122 '<|reserved_special_token_114|>' is not marked as EOG
load: control token: 128119 '<|reserved_special_token_111|>' is not marked as EOG
load: control token: 128115 '<|reserved_special_token_107|>' is not marked as EOG
load: control token: 128112 '<|reserved_special_token_104|>' is not marked as EOG
load: control token: 128110 '<|reserved_special_token_102|>' is not marked as EOG
load: control token: 128109 '<|reserved_special_token_101|>' is not marked as EOG
load: control token: 128108 '<|reserved_special_token_100|>' is not marked as EOG
load: control token: 128106 '<|reserved_special_token_98|>' is not marked as EOG
load: control token: 128103 '<|reserved_special_token_95|>' is not marked as EOG
load: control token: 128102 '<|reserved_special_token_94|>' is not marked as EOG
load: control token: 128101 '<|reserved_special_token_93|>' is not marked as EOG
load: control token: 128097 '<|reserved_special_token_89|>' is not marked as EOG
load: control token: 128091 '<|reserved_special_token_83|>' is not marked as EOG
load: control token: 128090 '<|reserved_special_token_82|>' is not marked as EOG
load: control token: 128089 '<|reserved_special_token_81|>' is not marked as EOG
load: control token: 128087 '<|reserved_special_token_79|>' is not marked as EOG
load: control token: 128085 '<|reserved_special_token_77|>' is not marked as EOG
load: control token: 128081 '<|reserved_special_token_73|>' is not marked as EOG
load: control token: 128078 '<|reserved_special_token_70|>' is not marked as EOG
load: control token: 128076 '<|reserved_special_token_68|>' is not marked as EOG
load: control token: 128075 '<|reserved_special_token_67|>' is not marked as EOG
load: control token: 128073 '<|reserved_special_token_65|>' is not marked as EOG
load: control token: 128068 '<|reserved_special_token_60|>' is not marked as EOG
load: control token: 128067 '<|reserved_special_token_59|>' is not marked as EOG
load: control token: 128065 '<|reserved_special_token_57|>' is not marked as EOG
load: control token: 128063 '<|reserved_special_token_55|>' is not marked as EOG
load: control token: 128062 '<|reserved_special_token_54|>' is not marked as EOG
load: control token: 128060 '<|reserved_special_token_52|>' is not marked as EOG
load: control token: 128059 '<|reserved_special_token_51|>' is not marked as EOG
load: control token: 128057 '<|reserved_special_token_49|>' is not marked as EOG
load: control token: 128054 '<|reserved_special_token_46|>' is not marked as EOG
load: control token: 128046 '<|reserved_special_token_38|>' is not marked as EOG
load: control token: 128045 '<|reserved_special_token_37|>' is not marked as EOG
load: control token: 128044 '<|reserved_special_token_36|>' is not marked as EOG
load: control token: 128043 '<|reserved_special_token_35|>' is not marked as EOG
load: control token: 128038 '<|reserved_special_token_30|>' is not marked as EOG
load: control token: 128036 '<|reserved_special_token_28|>' is not marked as EOG
load: control token: 128035 '<|reserved_special_token_27|>' is not marked as EOG
load: control token: 128032 '<|reserved_special_token_24|>' is not marked as EOG
load: control token: 128028 '<|reserved_special_token_20|>' is not marked as EOG
load: control token: 128027 '<|reserved_special_token_19|>' is not marked as EOG
load: control token: 128024 '<|reserved_special_token_16|>' is not marked as EOG
load: control token: 128023 '<|reserved_special_token_15|>' is not marked as EOG
load: control token: 128022 '<|reserved_special_token_14|>' is not marked as EOG
load: control token: 128021 '<|reserved_special_token_13|>' is not marked as EOG
load: control token: 128018 '<|reserved_special_token_10|>' is not marked as EOG
load: control token: 128016 '<|reserved_special_token_8|>' is not marked as EOG
load: control token: 128015 '<|reserved_special_token_7|>' is not marked as EOG
load: control token: 128013 '<|reserved_special_token_5|>' is not marked as EOG
load: control token: 128011 '<|reserved_special_token_3|>' is not marked as EOG
load: control token: 128005 '<|reserved_special_token_2|>' is not marked as EOG
load: control token: 128004 '<|finetune_right_pad_id|>' is not marked as EOG
load: control token: 128002 '<|reserved_special_token_0|>' is not marked as EOG
load: control token: 128252 '<|reserved_special_token_244|>' is not marked as EOG
load: control token: 128190 '<|reserved_special_token_182|>' is not marked as EOG
load: control token: 128183 '<|reserved_special_token_175|>' is not marked as EOG
load: control token: 128137 '<|reserved_special_token_129|>' is not marked as EOG
load: control token: 128182 '<|reserved_special_token_174|>' is not marked as EOG
load: control token: 128040 '<|reserved_special_token_32|>' is not marked as EOG
load: control token: 128048 '<|reserved_special_token_40|>' is not marked as EOG
load: control token: 128092 '<|reserved_special_token_84|>' is not marked as EOG
load: control token: 128215 '<|reserved_special_token_207|>' is not marked as EOG
load: control token: 128107 '<|reserved_special_token_99|>' is not marked as EOG
load: control token: 128208 '<|reserved_special_token_200|>' is not marked as EOG
load: control token: 128145 '<|reserved_special_token_137|>' is not marked as EOG
load: control token: 128031 '<|reserved_special_token_23|>' is not marked as EOG
load: control token: 128129 '<|reserved_special_token_121|>' is not marked as EOG
load: control token: 128201 '<|reserved_special_token_193|>' is not marked as EOG
load: control token: 128074 '<|reserved_special_token_66|>' is not marked as EOG
load: control token: 128095 '<|reserved_special_token_87|>' is not marked as EOG
load: control token: 128186 '<|reserved_special_token_178|>' is not marked as EOG
load: control token: 128143 '<|reserved_special_token_135|>' is not marked as EOG
load: control token: 128229 '<|reserved_special_token_221|>' is not marked as EOG
load: control token: 128007 '<|end_header_id|>' is not marked as EOG
load: control token: 128055 '<|reserved_special_token_47|>' is not marked as EOG
load: control token: 128056 '<|reserved_special_token_48|>' is not marked as EOG
load: control token: 128061 '<|reserved_special_token_53|>' is not marked as EOG
load: control token: 128153 '<|reserved_special_token_145|>' is not marked as EOG
load: control token: 128152 '<|reserved_special_token_144|>' is not marked as EOG
load: control token: 128212 '<|reserved_special_token_204|>' is not marked as EOG
load: control token: 128172 '<|reserved_special_token_164|>' is not marked as EOG
load: control token: 128160 '<|reserved_special_token_152|>' is not marked as EOG
load: control token: 128041 '<|reserved_special_token_33|>' is not marked as EOG
load: control token: 128181 '<|reserved_special_token_173|>' is not marked as EOG
load: control token: 128094 '<|reserved_special_token_86|>' is not marked as EOG
load: control token: 128118 '<|reserved_special_token_110|>' is not marked as EOG
load: control token: 128236 '<|reserved_special_token_228|>' is not marked as EOG
load: control token: 128148 '<|reserved_special_token_140|>' is not marked as EOG
load: control token: 128042 '<|reserved_special_token_34|>' is not marked as EOG
load: control token: 128139 '<|reserved_special_token_131|>' is not marked as EOG
load: control token: 128173 '<|reserved_special_token_165|>' is not marked as EOG
load: control token: 128239 '<|reserved_special_token_231|>' is not marked as EOG
load: control token: 128157 '<|reserved_special_token_149|>' is not marked as EOG
load: control token: 128052 '<|reserved_special_token_44|>' is not marked as EOG
load: control token: 128026 '<|reserved_special_token_18|>' is not marked as EOG
load: control token: 128003 '<|reserved_special_token_1|>' is not marked as EOG
load: control token: 128019 '<|reserved_special_token_11|>' is not marked as EOG
load: control token: 128116 '<|reserved_special_token_108|>' is not marked as EOG
load: control token: 128161 '<|reserved_special_token_153|>' is not marked as EOG
load: control token: 128226 '<|reserved_special_token_218|>' is not marked as EOG
load: control token: 128159 '<|reserved_special_token_151|>' is not marked as EOG
load: control token: 128012 '<|reserved_special_token_4|>' is not marked as EOG
load: control token: 128088 '<|reserved_special_token_80|>' is not marked as EOG
load: control token: 128163 '<|reserved_special_token_155|>' is not marked as EOG
load: control token: 128001 '<|end_of_text|>' is not marked as EOG
load: control token: 128113 '<|reserved_special_token_105|>' is not marked as EOG
load: control token: 128250 '<|reserved_special_token_242|>' is not marked as EOG
load: control token: 128125 '<|reserved_special_token_117|>' is not marked as EOG
load: control token: 128053 '<|reserved_special_token_45|>' is not marked as EOG
load: control token: 128224 '<|reserved_special_token_216|>' is not marked as EOG
load: control token: 128247 '<|reserved_special_token_239|>' is not marked as EOG
load: control token: 128251 '<|reserved_special_token_243|>' is not marked as EOG
load: control token: 128216 '<|reserved_special_token_208|>' is not marked as EOG
load: control token: 128006 '<|start_header_id|>' is not marked as EOG
load: control token: 128211 '<|reserved_special_token_203|>' is not marked as EOG
load: control token: 128077 '<|reserved_special_token_69|>' is not marked as EOG
load: control token: 128237 '<|reserved_special_token_229|>' is not marked as EOG
load: control token: 128086 '<|reserved_special_token_78|>' is not marked as EOG
load: control token: 128227 '<|reserved_special_token_219|>' is not marked as EOG
load: control token: 128058 '<|reserved_special_token_50|>' is not marked as EOG
load: control token: 128100 '<|reserved_special_token_92|>' is not marked as EOG
load: control token: 128209 '<|reserved_special_token_201|>' is not marked as EOG
load: control token: 128084 '<|reserved_special_token_76|>' is not marked as EOG
load: control token: 128071 '<|reserved_special_token_63|>' is not marked as EOG
load: control token: 128070 '<|reserved_special_token_62|>' is not marked as EOG
load: control token: 128049 '<|reserved_special_token_41|>' is not marked as EOG
load: control token: 128197 '<|reserved_special_token_189|>' is not marked as EOG
load: control token: 128072 '<|reserved_special_token_64|>' is not marked as EOG
load: control token: 128000 '<|begin_of_text|>' is not marked as EOG
load: control token: 128223 '<|reserved_special_token_215|>' is not marked as EOG
load: control token: 128217 '<|reserved_special_token_209|>' is not marked as EOG
load: control token: 128111 '<|reserved_special_token_103|>' is not marked as EOG
load: control token: 128203 '<|reserved_special_token_195|>' is not marked as EOG
load: control token: 128051 '<|reserved_special_token_43|>' is not marked as EOG
load: control token: 128030 '<|reserved_special_token_22|>' is not marked as EOG
load: control token: 128117 '<|reserved_special_token_109|>' is not marked as EOG
load: control token: 128010 '<|python_tag|>' is not marked as EOG
load: control token: 128238 '<|reserved_special_token_230|>' is not marked as EOG
load: control token: 128255 '<|reserved_special_token_247|>' is not marked as EOG
load: control token: 128202 '<|reserved_special_token_194|>' is not marked as EOG
load: control token: 128132 '<|reserved_special_token_124|>' is not marked as EOG
load: control token: 128248 '<|reserved_special_token_240|>' is not marked as EOG
load: control token: 128167 '<|reserved_special_token_159|>' is not marked as EOG
load: control token: 128127 '<|reserved_special_token_119|>' is not marked as EOG
load: control token: 128105 '<|reserved_special_token_97|>' is not marked as EOG
load: control token: 128039 '<|reserved_special_token_31|>' is not marked as EOG
load: control token: 128232 '<|reserved_special_token_224|>' is not marked as EOG
load: control token: 128166 '<|reserved_special_token_158|>' is not marked as EOG
load: control token: 128130 '<|reserved_special_token_122|>' is not marked as EOG
load: control token: 128114 '<|reserved_special_token_106|>' is not marked as EOG
load: control token: 128234 '<|reserved_special_token_226|>' is not marked as EOG
load: control token: 128191 '<|reserved_special_token_183|>' is not marked as EOG
load: control token: 128064 '<|reserved_special_token_56|>' is not marked as EOG
load: control token: 128140 '<|reserved_special_token_132|>' is not marked as EOG
load: control token: 128096 '<|reserved_special_token_88|>' is not marked as EOG
load: control token: 128098 '<|reserved_special_token_90|>' is not marked as EOG
load: control token: 128192 '<|reserved_special_token_184|>' is not marked as EOG
load: control token: 128093 '<|reserved_special_token_85|>' is not marked as EOG
load: control token: 128150 '<|reserved_special_token_142|>' is not marked as EOG
load: control token: 128222 '<|reserved_special_token_214|>' is not marked as EOG
load: control token: 128233 '<|reserved_special_token_225|>' is not marked as EOG
load: control token: 128220 '<|reserved_special_token_212|>' is not marked as EOG
load: control token: 128034 '<|reserved_special_token_26|>' is not marked as EOG
load: control token: 128033 '<|reserved_special_token_25|>' is not marked as EOG
load: control token: 128253 '<|reserved_special_token_245|>' is not marked as EOG
load: control token: 128195 '<|reserved_special_token_187|>' is not marked as EOG
load: control token: 128099 '<|reserved_special_token_91|>' is not marked as EOG
load: control token: 128189 '<|reserved_special_token_181|>' is not marked as EOG
load: control token: 128210 '<|reserved_special_token_202|>' is not marked as EOG
load: control token: 128174 '<|reserved_special_token_166|>' is not marked as EOG
load: control token: 128083 '<|reserved_special_token_75|>' is not marked as EOG
load: control token: 128080 '<|reserved_special_token_72|>' is not marked as EOG
load: control token: 128104 '<|reserved_special_token_96|>' is not marked as EOG
load: control token: 128082 '<|reserved_special_token_74|>' is not marked as EOG
load: control token: 128219 '<|reserved_special_token_211|>' is not marked as EOG
load: control token: 128017 '<|reserved_special_token_9|>' is not marked as EOG
load: control token: 128050 '<|reserved_special_token_42|>' is not marked as EOG
load: control token: 128205 '<|reserved_special_token_197|>' is not marked as EOG
load: control token: 128047 '<|reserved_special_token_39|>' is not marked as EOG
load: control token: 128164 '<|reserved_special_token_156|>' is not marked as EOG
load: control token: 128020 '<|reserved_special_token_12|>' is not marked as EOG
load: control token: 128069 '<|reserved_special_token_61|>' is not marked as EOG
load: control token: 128245 '<|reserved_special_token_237|>' is not marked as EOG
load: control token: 128121 '<|reserved_special_token_113|>' is not marked as EOG
load: control token: 128079 '<|reserved_special_token_71|>' is not marked as EOG
load: control token: 128037 '<|reserved_special_token_29|>' is not marked as EOG
load: control token: 128244 '<|reserved_special_token_236|>' is not marked as EOG
load: control token: 128029 '<|reserved_special_token_21|>' is not marked as EOG
load: control token: 128221 '<|reserved_special_token_213|>' is not marked as EOG
load: control token: 128066 '<|reserved_special_token_58|>' is not marked as EOG
load: control token: 128120 '<|reserved_special_token_112|>' is not marked as EOG
load: control token: 128014 '<|reserved_special_token_6|>' is not marked as EOG
load: control token: 128025 '<|reserved_special_token_17|>' is not marked as EOG
load: control token: 128126 '<|reserved_special_token_118|>' is not marked as EOG
load: special tokens cache size = 256
load: token to piece cache size = 0.7999 MB
print_info: arch = llama
print_info: vocab_only = 0
print_info: n_ctx_train = 131072
print_info: n_embd = 3072
print_info: n_layer = 28
print_info: n_head = 24
print_info: n_head_kv = 8
print_info: n_rot = 128
print_info: n_swa = 0
print_info: n_embd_head_k = 128
print_info: n_embd_head_v = 128
print_info: n_gqa = 3
print_info: n_embd_k_gqa = 1024
print_info: n_embd_v_gqa = 1024
print_info: f_norm_eps = 0.0e+00
print_info: f_norm_rms_eps = 1.0e-05
print_info: f_clamp_kqv = 0.0e+00
print_info: f_max_alibi_bias = 0.0e+00
print_info: f_logit_scale = 0.0e+00
print_info: n_ff = 8192
print_info: n_expert = 0
print_info: n_expert_used = 0
print_info: causal attn = 1
print_info: pooling type = 0
print_info: rope type = 0
print_info: rope scaling = linear
print_info: freq_base_train = 500000.0
print_info: freq_scale_train = 1
print_info: n_ctx_orig_yarn = 131072
print_info: rope_finetuned = unknown
print_info: ssm_d_conv = 0
print_info: ssm_d_inner = 0
print_info: ssm_d_state = 0
print_info: ssm_dt_rank = 0
print_info: ssm_dt_b_c_rms = 0
print_info: model type = 3B
print_info: model params = 3.21 B
print_info: general.name = Llama 3.2 3B Instruct
print_info: vocab type = BPE
print_info: n_vocab = 128256
print_info: n_merges = 280147
print_info: BOS token = 128000 '<|begin_of_text|>'
print_info: EOS token = 128009 '<|eot_id|>'
print_info: EOT token = 128009 '<|eot_id|>'
print_info: EOM token = 128008 '<|eom_id|>'
print_info: LF token = 198 'Ċ'
print_info: EOG token = 128008 '<|eom_id|>'
print_info: EOG token = 128009 '<|eot_id|>'
print_info: max token length = 256
load_tensors: layer 0 assigned to device Vulkan0
load_tensors: layer 1 assigned to device Vulkan0
load_tensors: layer 2 assigned to device Vulkan0
load_tensors: layer 3 assigned to device Vulkan0
load_tensors: layer 4 assigned to device Vulkan0
load_tensors: layer 5 assigned to device Vulkan0
load_tensors: layer 6 assigned to device Vulkan0
load_tensors: layer 7 assigned to device Vulkan0
load_tensors: layer 8 assigned to device Vulkan0
load_tensors: layer 9 assigned to device Vulkan0
load_tensors: layer 10 assigned to device Vulkan0
load_tensors: layer 11 assigned to device Vulkan0
load_tensors: layer 12 assigned to device Vulkan0
load_tensors: layer 13 assigned to device Vulkan0
load_tensors: layer 14 assigned to device Vulkan0
load_tensors: layer 15 assigned to device Vulkan0
load_tensors: layer 16 assigned to device Vulkan0
load_tensors: layer 17 assigned to device Vulkan0
load_tensors: layer 18 assigned to device Vulkan0
load_tensors: layer 19 assigned to device Vulkan0
load_tensors: layer 20 assigned to device Vulkan0
load_tensors: layer 21 assigned to device Vulkan0
load_tensors: layer 22 assigned to device Vulkan0
load_tensors: layer 23 assigned to device Vulkan0
load_tensors: layer 24 assigned to device Vulkan0
load_tensors: layer 25 assigned to device Vulkan0
load_tensors: layer 26 assigned to device Vulkan0
load_tensors: layer 27 assigned to device Vulkan0
load_tensors: layer 28 assigned to device Vulkan0
load_tensors: tensor 'token_embd.weight' (q6_K) (and 0 others) cannot be used with preferred buffer type CPU_AARCH64, using CPU instead
load_tensors: offloading 28 repeating layers to GPU
load_tensors: offloading output layer to GPU
load_tensors: offloaded 29/29 layers to GPU
load_tensors: Vulkan0 model buffer size = 1918.35 MiB
load_tensors: CPU_Mapped model buffer size = 308.23 MiB
llama_init_from_model: n_seq_max = 1
llama_init_from_model: n_ctx = 2048
llama_init_from_model: n_ctx_per_seq = 2048
llama_init_from_model: n_batch = 2048
llama_init_from_model: n_ubatch = 512
llama_init_from_model: flash_attn = 0
llama_init_from_model: freq_base = 500000.0
llama_init_from_model: freq_scale = 1
llama_init_from_model: n_ctx_per_seq (2048) < n_ctx_train (131072) -- the full capacity of the model will not be utilized
llama_kv_cache_init: kv_size = 2048, offload = 1, type_k = 'f16', type_v = 'f16', n_layer = 28, can_shift = 1
llama_kv_cache_init: layer 0: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 1: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 2: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 3: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 4: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 5: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 6: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 7: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 8: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 9: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 10: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 11: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 12: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 13: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 14: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 15: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 16: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 17: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 18: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 19: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 20: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 21: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 22: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 23: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 24: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 25: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 26: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 27: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: Vulkan0 KV buffer size = 224.00 MiB
llama_init_from_model: KV self size = 224.00 MiB, K (f16): 112.00 MiB, V (f16): 112.00 MiB
llama_init_from_model: Vulkan_Host output buffer size = 0.49 MiB
llama_init_from_model: Vulkan0 compute buffer size = 256.50 MiB
llama_init_from_model: Vulkan_Host compute buffer size = 10.01 MiB
llama_init_from_model: graph nodes = 902
llama_init_from_model: graph splits = 2
**Hello World Program in Python**
.
. Execution time comparison:
|
Worth exploring IMO |
An attempt to offloading workload to nvidia GPU on Fedora 41 generates error:
It looks that the
quay.io/ramalama/cuda:latest
container is based on the CUDA Version: 12.8.0, while the CUDA on Fedora 41 is currently based on 12.7 :which comes from the following package:
Full log:
The text was updated successfully, but these errors were encountered: