You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for the great work! Here's a simple question: is there any plan for supporting paged attention? Actually we're working on this part under aotriton v0.8.( we've got some performance gain under v0.4 beta already, while the autotuning procedure introduced in v0.5 is kinda heavy thus the upgrade may take some time.) We'd like to create a new PR once the implementation is ready.
Operating System
No response
GPU
MI200/MI300
ROCm Component
No response
The text was updated successfully, but these errors were encountered:
Paged Attention is in our view, but its priority is questionable since there are quite a lot other improvements already on the 0.9b schedule.
There is one thing to pay attention. In the upcoming release we will re-organize the directory structures
All test/*.py and tritonsrc/*.py files will be moved to test/FA/ and tritonsrc/FA (FA for flash-attention but subject to change in practice). Thus you may need to rebase your code at some point.
while the autotuning procedure introduced in v0.5 is kinda heavy thus the upgrade may take some time
The autotune process has changed a lot since its first introduction. Currently it is not depending on Triton's autotune Infra. anymore (although still share some code), and can utilize multi-core CPUs.
The overall tuning process can be found at How To Generate Tuning Database.md
Suggestion Description
Thanks for the great work! Here's a simple question: is there any plan for supporting paged attention? Actually we're working on this part under aotriton v0.8.( we've got some performance gain under v0.4 beta already, while the autotuning procedure introduced in v0.5 is kinda heavy thus the upgrade may take some time.) We'd like to create a new PR once the implementation is ready.
Operating System
No response
GPU
MI200/MI300
ROCm Component
No response
The text was updated successfully, but these errors were encountered: