Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tune paged attention parameters for AMD GPU. #3255

Merged
merged 5 commits into from
Feb 2, 2025

Conversation

whchung
Copy link
Contributor

@whchung whchung commented Feb 1, 2025

Motivation

Fine tune SGLang page attention kernel performance on AMD MI GPU for LLM.

Modifications

Changes:

  • num_kv_splits : 8 -> 16
  • BLOCK : 64 -> 8
  • num_warps : 2 -> 1 when kv_group_num is more than 1
  • waves_per_cu : 4 -> 1 in grouped paged attention kernel

These knobs have been tested with a couple of workloads on AMD ROCm platform.

Changes:
- num_kv_splits
- BLOCK
- num_warps
@zhyncs zhyncs requested a review from HaiShaw February 1, 2025 17:26
whchung and others added 3 commits February 1, 2025 11:30
Make the tuning on the knobs only applicable on AMD ROCm platform via is_hip checks.
# AMD-specific Triton attention KV splits default number
if is_hip():
self.triton_attention_num_kv_splits = 16

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@whchung we noticed that default 8 works better to most cases? Is 16 slightly better as you see, or better toward long sequences as you interested?

Copy link
Collaborator

@HaiShaw HaiShaw Feb 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@this PR boosts grok, etc. as observed.

@HaiShaw HaiShaw merged commit d9eb935 into sgl-project:main Feb 2, 2025
10 checks passed
@HaiShaw HaiShaw added the good first issue Good for newcomers label Feb 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants