Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: Paged attention support #65

Open
xinji1 opened this issue Jan 13, 2025 · 1 comment
Open

[Feature]: Paged attention support #65

xinji1 opened this issue Jan 13, 2025 · 1 comment

Comments

@xinji1
Copy link

xinji1 commented Jan 13, 2025

Suggestion Description

Thanks for the great work! Here's a simple question: is there any plan for supporting paged attention? Actually we're working on this part under aotriton v0.8.( we've got some performance gain under v0.4 beta already, while the autotuning procedure introduced in v0.5 is kinda heavy thus the upgrade may take some time.) We'd like to create a new PR once the implementation is ready.

Operating System

No response

GPU

MI200/MI300

ROCm Component

No response

@xinyazhang
Copy link
Collaborator

xinyazhang commented Jan 15, 2025

You're welcome to contribute!

Paged Attention is in our view, but its priority is questionable since there are quite a lot other improvements already on the 0.9b schedule.

There is one thing to pay attention. In the upcoming release we will re-organize the directory structures
All test/*.py and tritonsrc/*.py files will be moved to test/FA/ and tritonsrc/FA (FA for flash-attention but subject to change in practice). Thus you may need to rebase your code at some point.

while the autotuning procedure introduced in v0.5 is kinda heavy thus the upgrade may take some time

The autotune process has changed a lot since its first introduction. Currently it is not depending on Triton's autotune Infra. anymore (although still share some code), and can utilize multi-core CPUs.
The overall tuning process can be found at How To Generate Tuning Database.md

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants