-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Faster Custom Paged Attention kernels #372
base: main
Are you sure you want to change the base?
Conversation
tests/kernels/test_attention.py
Outdated
@@ -117,7 +117,7 @@ def ref_single_query_cached_kv_attention( | |||
|
|||
@pytest.mark.parametrize( | |||
"version", | |||
["v1", "v2"] if not current_platform.is_rocm() else ["v1", "v2", "rocm"]) | |||
["v1", "v2"] if not current_platform.is_rocm() else ["rocm"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we still want to run unit tests on v1/v2? These backends are still supported
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gshtras Yes - but v1,v2 unit tests are broken - I saw errors when I tried running them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have suggested fixes to the v1
, v2
unit tests in this PR (#383)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's re-target #383 to this branch and combine them
csrc/rocm/attention.cu
Outdated
} | ||
|
||
#define CALL_CUSTOM_LAUNCHER(T, KVT, KV_DTYPE, BLK_SIZE, HEAD_SIZE, OUTT, \ | ||
PSIZE) \ | ||
PSIZE, ALIBI_ENABLED) \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This tells me that the CI linters are broken...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I missed running clang-format, let me do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the plus side, this exposed the CI linters issue :)
We'll hopefully get them back soon
Ported ROCm/vllm changes to upstream vLLM This commit manually ports changes from ROCm/vllm (ROCm#372) to upstream vLLM. The original work was done by sanyalington. Co-authored-by: sanyalington <[email protected]>
Ported ROCm/vllm changes to upstream vLLM This commit manually ports changes from ROCm/vllm (ROCm#372) to upstream vLLM. The original work was done by sanyalington. Co-authored-by: sanyalington <[email protected]> Signed-off-by: vllmellm <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving based on latest accuracy numbers.
* [Bugfix]: fix paged attention tests based on the updated kernels in `csrc/attention/paged_attention_v1.cu`,`csrc/attention/paged_attention_v2.cu` and `csrc/rocm/attention.cu`. * improve code documentation. * lint --------- Co-authored-by: vllmellm <[email protected]>
Faster CPA kernel based on mfma16x16x16 instructions.
Performance same as the numbers mentioned in #347
Fixed tests/kernels/test_attention.py::test_paged_attention unit test for CPA rocm kernel