[Misc] Pass `attention` to impl backend #12218

wangxiyuan · 2025-01-20T09:57:46Z

With #11969, a quantization method can be implemented and register to vLLM out-of-tree now. But there is no way to use the registered parameter for attention layer in custom attention backend impl.

This PR pass the attention object to attention backend, so that the backend can use the parameters registered by quantization attention method directly to keep the same with other kind of quantization method(Linear, MoE)

github-actions · 2025-01-20T09:57:58Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

DarkLight1337 · 2025-01-20T10:45:18Z

vllm/attention/backends/abstract.py

@@ -244,13 +244,12 @@ def __init__(
    @abstractmethod
    def forward(
        self,
+        layer: torch.nn.Module,


Can we specify an interface for the attention layer explicitly?

Sorry if I missed your suggestion. Do you mean to add an interface in Attention and get/set parameters there?

Attention inherits from torch.nn.Module which contains register_parameter and get_parameter func already.

I mean that we can define a typing.Protocol so we know which attributes of the layer we are supposed to access.

Got it, just pushed a new commit. Not sure it's what you want or not. Need your feedback. Thanks.

Signed-off-by: wangxiyuan <[email protected]>

DarkLight1337

Yes, this is what I meant. Thanks for updating this!

Signed-off-by: wangxiyuan <[email protected]> Signed-off-by: Matthew Hendrey <[email protected]>

This reverts commit 86bfb6d.

Signed-off-by: wangxiyuan <[email protected]>

wangxiyuan requested review from WoosukKwon, zhuohan123, youkaichao, comaniac and njhill as code owners January 20, 2025 09:57

DarkLight1337 reviewed Jan 20, 2025

View reviewed changes

[Misc] Pass attention to impl backend

6a5a4e5

Signed-off-by: wangxiyuan <[email protected]>

wangxiyuan force-pushed the attention_enhance branch from 78dedb9 to 6a5a4e5 Compare January 20, 2025 11:00

wangxiyuan requested a review from ywang96 as a code owner January 20, 2025 11:00

Add Attention Interface

1cb9ccd

Signed-off-by: wangxiyuan <[email protected]>

DarkLight1337 approved these changes Jan 20, 2025

View reviewed changes

DarkLight1337 enabled auto-merge (squash) January 20, 2025 12:51

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Jan 20, 2025

youkaichao disabled auto-merge January 20, 2025 15:25

youkaichao merged commit 86bfb6d into vllm-project:main Jan 20, 2025
47 of 52 checks passed

kzawora-intel mentioned this pull request Jan 21, 2025

Rebase 2025.01.21 HabanaAI/vllm-fork#714

Open

mhendrey pushed a commit to mhendrey/vllm that referenced this pull request Jan 23, 2025

[Misc] Pass attention to impl backend (vllm-project#12218)

b89529b

Signed-off-by: wangxiyuan <[email protected]> Signed-off-by: Matthew Hendrey <[email protected]>

yessenzhar pushed a commit to deepinfra/vllm that referenced this pull request Jan 23, 2025

Revert "[Misc] Pass attention to impl backend (vllm-project#12218)"

55cc62a

This reverts commit 86bfb6d.

abmfy pushed a commit to abmfy/vllm-flashinfer that referenced this pull request Jan 24, 2025

[Misc] Pass attention to impl backend (vllm-project#12218)

776e855

Signed-off-by: wangxiyuan <[email protected]>

abmfy pushed a commit to abmfy/vllm-flashinfer that referenced this pull request Jan 24, 2025

[Misc] Pass attention to impl backend (vllm-project#12218)

67a1e32

Signed-off-by: wangxiyuan <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Misc] Pass `attention` to impl backend #12218

[Misc] Pass `attention` to impl backend #12218

wangxiyuan commented Jan 20, 2025 •

edited by github-actions bot

Loading

github-actions bot commented Jan 20, 2025

DarkLight1337 Jan 20, 2025

wangxiyuan Jan 20, 2025 •

edited

Loading

DarkLight1337 Jan 20, 2025

wangxiyuan Jan 20, 2025

DarkLight1337 left a comment

[Misc] Pass attention to impl backend #12218

[Misc] Pass attention to impl backend #12218

Conversation

wangxiyuan commented Jan 20, 2025 • edited by github-actions bot Loading

github-actions bot commented Jan 20, 2025

DarkLight1337 Jan 20, 2025

Choose a reason for hiding this comment

wangxiyuan Jan 20, 2025 • edited Loading

Choose a reason for hiding this comment

DarkLight1337 Jan 20, 2025

Choose a reason for hiding this comment

wangxiyuan Jan 20, 2025

Choose a reason for hiding this comment

DarkLight1337 left a comment

Choose a reason for hiding this comment

[Misc] Pass `attention` to impl backend #12218

[Misc] Pass `attention` to impl backend #12218

wangxiyuan commented Jan 20, 2025 •

edited by github-actions bot

Loading

wangxiyuan Jan 20, 2025 •

edited

Loading