-
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Quant] Add SupportsQuant
to phi3 and clip
#13104
base: main
Are you sure you want to change the base?
[Quant] Add SupportsQuant
to phi3 and clip
#13104
Conversation
Signed-off-by: Kyle Sayers <[email protected]>
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this idea, but why does it only apply to these two models?
@@ -441,3 +442,16 @@ def supports_cross_encoding( | |||
model: Union[Type[object], object], | |||
) -> Union[TypeIs[Type[SupportsCrossEncoding]], TypeIs[SupportsCrossEncoding]]: | |||
return is_pooling_model(model) and _supports_cross_encoding(model) | |||
|
|||
|
|||
class SupportsQuant: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why does it only apply to these two models?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Follow ups
- Add SupportsQuant mixin to all models which support quantization
This PR just introduces the mixin, separate PRs will be dedicated to integrating the mixin with model-specific tests
Signed-off-by: Kyle Sayers <[email protected]>
SupportsQuant
to phi3 and clipSupportsQuant
to phi3 and clip
Purpose
SupportsQuant
whichpacked_modules_mapping
configure_quant_config
function after it has been added to a sufficient number of modelsconfigure_quant_config
assumes that allpacked_modules_mapping
s will be declared prior to initialization. In reality, some models are submodels of each other, so their mappings can only be determined at init time.Changes
SupportsQuant
withpacked_modules_mapping
updating and setting thequant_config
attributeSupportsQuant
to Phi3 and Clip models in order to demonstrate usefulnessQuantizationConfig.packed_modules_mapping
Follow ups
SupportsQuant
mixin to all models which support quantizationSupportsQuant
which is checked at runtime similar to LoRAconfigure_quant_config
after models have been updatedTesting
phi3_example.py