-
-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Misc] Pass attention
to impl backend
#12218
Conversation
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can do one of these:
🚀 |
vllm/attention/backends/abstract.py
Outdated
@@ -244,13 +244,12 @@ def __init__( | |||
@abstractmethod | |||
def forward( | |||
self, | |||
layer: torch.nn.Module, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we specify an interface for the attention layer explicitly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry if I missed your suggestion. Do you mean to add an interface in Attention
and get/set parameters there?
Attention
inherits from torch.nn.Module
which contains register_parameter
and get_parameter
func already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean that we can define a typing.Protocol
so we know which attributes of the layer we are supposed to access.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, just pushed a new commit. Not sure it's what you want or not. Need your feedback. Thanks.
Signed-off-by: wangxiyuan <[email protected]>
78dedb9
to
6a5a4e5
Compare
Signed-off-by: wangxiyuan <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is what I meant. Thanks for updating this!
Signed-off-by: wangxiyuan <[email protected]> Signed-off-by: Matthew Hendrey <[email protected]>
This reverts commit 86bfb6d.
Signed-off-by: wangxiyuan <[email protected]>
Signed-off-by: wangxiyuan <[email protected]>
With #11969, a quantization method can be implemented and register to vLLM out-of-tree now. But there is no way to use the registered parameter for attention layer in custom attention backend impl.
This PR pass the attention object to attention backend, so that the backend can use the parameters registered by quantization attention method directly to keep the same with other kind of quantization method(Linear, MoE)