Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feat] Sage Attention Support for Triton kernel #929

Open
wants to merge 15 commits into
base: develop
Choose a base branch
from

Conversation

l1cacheDell
Copy link

@l1cacheDell l1cacheDell commented Dec 27, 2024

Sage attention Triton kernel Support

So far stagely support is_casual=False situation. is_casual=True is implemented in another PR

The performance and comparison with FA2 was presented in inflow documents, and here we will save it.

@CLAassistant
Copy link

CLAassistant commented Dec 27, 2024

CLA assistant check
All committers have signed the CLA.

Copy link

paddle-bot bot commented Dec 27, 2024

Thanks for your contribution!

@l1cacheDell l1cacheDell changed the title fix CLAPAudioCfg assertion error [Feat] Sage Attention Support for Triton kernel Jan 2, 2025
@l1cacheDell l1cacheDell marked this pull request as draft January 2, 2025 11:59
@l1cacheDell l1cacheDell marked this pull request as ready for review January 5, 2025 08:01
PD_BUILD_OP(${op_name})
.Inputs({"x", "k_tensor", "v_tensor", "q_scale", "k_scale"})
.Outputs({"out_tensor", "lse_tensor"})
.Attrs({"output_dtype: std::string", "tensor_layout: std::string", "return_lse: int"})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lse需要吗?如果确认推理不需要的话是否可以删掉呢?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

多卡并行推理可能需要打开,建议保留。

@l1cacheDell
Copy link
Author

l1cacheDell commented Jan 10, 2025

  • Cosine similarity (Compared sageattn output with FA2 output): 0.9999207854270935,
  • L1: 0.01878473162651062,
  • Max diff (Compared sageattn output with FA2 output): 0.00988770

@l1cacheDell
Copy link
Author

This PR was left clean, which is dedicated to sageattn triton kernel.

Modifications on other files were removed, for the purpose of a clean PR and clean review.

Annotations added, for params description and function usage.

All code have passed test scripts. The test script will be shared on inflow, instead of in this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants