Skip to content

Commit

Permalink
Additional tuning for grouped page attention kernel.
Browse files Browse the repository at this point in the history
Changed:
- waves_per_eu
  • Loading branch information
whchung committed Feb 1, 2025
1 parent c145acb commit 9c5980e
Showing 1 changed file with 1 addition and 1 deletion.
Original file line number Diff line number Diff line change
Expand Up @@ -436,7 +436,7 @@ def _decode_grouped_att_m_fwd(
if is_hip_:
# https://rocm.docs.amd.com/en/docs-6.2.0/how-to/llm-fine-tuning-optimization/optimizing-triton-kernel.html
# https://github.com/triton-lang/triton/blob/main/third_party/amd/backend/compiler.py
extra_kargs = {"waves_per_eu": 4, "matrix_instr_nonkdim": 16, "kpack": 2}
extra_kargs = {"waves_per_eu": 1, "matrix_instr_nonkdim": 16, "kpack": 2}

_fwd_grouped_kernel_stage1[grid](
q,
Expand Down

0 comments on commit 9c5980e

Please sign in to comment.