[V1][Kernel] Refactor the prefix_prefill kernel so that the caller no longer has to pass in the context lengths #13095

SageMoore · 2025-02-11T15:17:50Z

This patch changes the prefix_prefill kernel so that it will calculate the context length using the query length and the sequence length, both of which are already passed in. This makes the kernel a bit more usable on V1 where we don't keep track of the context lengths tensor in the attention meta data.

Signed-off-by: Sage Moore <[email protected]>

github-actions · 2025-02-11T15:18:06Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

Signed-off-by: Sage Moore <[email protected]>

…prefix-prefill-refactor

SageMoore added 2 commits February 10, 2025 23:24

init

95df571

Signed-off-by: Sage Moore <[email protected]>

init

0bfe435

Signed-off-by: Sage Moore <[email protected]>

SageMoore marked this pull request as ready for review February 11, 2025 15:32

SageMoore requested review from tlrmchlsmth and WoosukKwon as code owners February 11, 2025 15:32

SageMoore mentioned this pull request Feb 11, 2025

[ROCm][V1] Add intial ROCm support to V1 #12790

Merged

SageMoore added 2 commits February 11, 2025 19:12

minor fix

d2f3c85

Signed-off-by: Sage Moore <[email protected]>

Merge branch 'main' of https://github.com/neuralmagic/vllm into sage/…

c7497f3

…prefix-prefill-refactor

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[V1][Kernel] Refactor the prefix_prefill kernel so that the caller no longer has to pass in the context lengths #13095

[V1][Kernel] Refactor the prefix_prefill kernel so that the caller no longer has to pass in the context lengths #13095

SageMoore commented Feb 11, 2025 •

edited by github-actions bot

Loading

github-actions bot commented Feb 11, 2025

[V1][Kernel] Refactor the prefix_prefill kernel so that the caller no longer has to pass in the context lengths #13095

Are you sure you want to change the base?

[V1][Kernel] Refactor the prefix_prefill kernel so that the caller no longer has to pass in the context lengths #13095

Conversation

SageMoore commented Feb 11, 2025 • edited by github-actions bot Loading

github-actions bot commented Feb 11, 2025

SageMoore commented Feb 11, 2025 •

edited by github-actions bot

Loading