Support logit_bias in v1 Sampler #13079

houseroad · 2025-02-11T08:51:25Z

Introduce logit_bias support in v1 Sampler.

Tested with new test cases:
pytest tests/v1/sample/test_sampler.py -k "test_sampler_logit_bias"
pytest tests/v1/worker/test_gpu_input_batch.py

github-actions · 2025-02-11T08:51:39Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

WoosukKwon

Thanks for the PR! Let some minor comments.

vllm/v1/worker/gpu_input_batch.py

vllm/v1/sample/sampler.py

WoosukKwon · 2025-02-11T18:03:55Z

vllm/v1/sample/sampler.py

+                for token_id, bias in logit_bias.items():
+                    logits[i, token_id] += bias


Can we add a comment like TODO: This is extremely slow. Optimize this.?

@njhill @robertgshaw2-redhat Although this implementation is a bit slow, I'm comfortable merging the PR since I haven't found a way to optimize it yet, and getting the functionality in is our top priority. What do you think?

Yeah, I agree. We can write some customized kernel to handle such things in c++. Also we may change the representation of logit_bias from dict to key value pair.

I can create some TODO as follow up.

@houseroad Sg. Could you please update the PR?

@WoosukKwon I think that's fine since we need the functionality asap. But I think it should be simple to vectorize this without any custom kernel.

We just need to maintain in the batch three one-dim tensors of the same length:

all the bias values concatenated (b)

corresponding request indices (s)

corresponding token ids (t)

We only need to update these when any requests with logit bias are added or removed from the batch.

Then we can just do logits[(s, t)] += b

Updated the PR. Please let me know if you want me directly jump to the optimized solution.

@njhill @houseroad Considering that the code is pretty isolated, I think we can merge the PR first and have a followup PR to optimize it.

@njhill Re your idea: Each request's logits_bias has different lengths. How do we handle that (with the persistent batch)?

@WoosukKwon I could be missing something... let's merge this and then I can open another PR :)

I think we can have ragged format representation, 3 tensors, tensor a for length/offset of each batch, tensor b for token ids, tensor c for bias. And one torch op takes these 3 tensors as inputs, and we leverage C++ logic to handle it. It should be much faster then the current logic.

Maybe in the SamplingMeta or param, we should just preproces things like this.

The additional overhead is that once we update the batch, we may need to generate new tensors, which should be acceptable.

vllm/v1/engine/async_llm.py

Signed-off-by: Lu Fang <[email protected]>

vllm/sampling_params.py

Signed-off-by: Lu Fang <[email protected]>

WoosukKwon

LGTM. Thanks for the PR!

WoosukKwon · 2025-02-13T09:32:01Z

@houseroad Seems like we need to fix a unit test: https://buildkite.com/vllm/ci/builds/13363#0194fe70-1618-4cb6-ae2d-8380ee8a2041 😓

Signed-off-by: Lu Fang <[email protected]>

mergify bot added the v1 label Feb 11, 2025

WoosukKwon mentioned this pull request Feb 11, 2025

[V1][Help Wanted] Porting missing sampling parameters to V1 #13058

Open

7 tasks

WoosukKwon reviewed Feb 11, 2025

View reviewed changes

houseroad force-pushed the v1_logit_bias branch 2 times, most recently from de10bc2 to 05acc1f Compare February 12, 2025 08:42

WoosukKwon reviewed Feb 12, 2025

View reviewed changes

vllm/v1/engine/async_llm.py Outdated Show resolved Hide resolved

Support logit_bias in v1 Sampler

7b17c04

Signed-off-by: Lu Fang <[email protected]>

houseroad force-pushed the v1_logit_bias branch from 05acc1f to 7b17c04 Compare February 12, 2025 20:35

houseroad marked this pull request as ready for review February 12, 2025 20:42

houseroad requested review from robertgshaw2-redhat, njhill, ywang96, comaniac and alexm-redhat as code owners February 12, 2025 20:42

Address comments about preprocess of logit_bias

c48b194

Signed-off-by: Lu Fang <[email protected]>

WoosukKwon reviewed Feb 13, 2025

View reviewed changes

vllm/sampling_params.py Show resolved Hide resolved

houseroad added 2 commits February 12, 2025 22:07

add more comment for clamping the logit_bias

17b50c5

Signed-off-by: Lu Fang <[email protected]>

fix some bugs

3b84436

Signed-off-by: Lu Fang <[email protected]>

WoosukKwon approved these changes Feb 13, 2025

View reviewed changes

WoosukKwon added the ready ONLY add when PR is ready to merge/full CI is needed label Feb 13, 2025

fix the unittest

d105842

Signed-off-by: Lu Fang <[email protected]>

DarkLight1337 enabled auto-merge (squash) February 14, 2025 12:34

simon-mo merged commit 6224a9f into vllm-project:main Feb 14, 2025
31 of 33 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support logit_bias in v1 Sampler #13079

Support logit_bias in v1 Sampler #13079

houseroad commented Feb 11, 2025 •

edited by github-actions bot

Loading

github-actions bot commented Feb 11, 2025

WoosukKwon left a comment

WoosukKwon Feb 11, 2025

WoosukKwon Feb 11, 2025

houseroad Feb 11, 2025

WoosukKwon Feb 12, 2025

njhill Feb 12, 2025 •

edited

Loading

houseroad Feb 13, 2025

WoosukKwon Feb 13, 2025

WoosukKwon Feb 13, 2025

njhill Feb 13, 2025

houseroad Feb 13, 2025

WoosukKwon left a comment

WoosukKwon commented Feb 13, 2025

		for token_id, bias in logit_bias.items():
		logits[i, token_id] += bias

Support logit_bias in v1 Sampler #13079

Support logit_bias in v1 Sampler #13079

Conversation

houseroad commented Feb 11, 2025 • edited by github-actions bot Loading

github-actions bot commented Feb 11, 2025

WoosukKwon left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

njhill Feb 12, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

WoosukKwon left a comment

Choose a reason for hiding this comment

WoosukKwon commented Feb 13, 2025

houseroad commented Feb 11, 2025 •

edited by github-actions bot

Loading

njhill Feb 12, 2025 •

edited

Loading