Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix edge case for continue_final_message #36404

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

Rocketknight1
Copy link
Member

@Rocketknight1 Rocketknight1 commented Feb 25, 2025

The code for continue_final_message in apply_chat_template fails in the following edge case:

  • The chat template trims trailing spaces from messages
  • The chat template adds its own spacing after messages
  • The unstripped content of the final message is identical to the stripped content of a previous message plus the template's built-in spacing

This causes the template to incorrectly identify the previous message as the final message and strip off a big chunk of the conversation after this point. This is a fairly uncommon edge case, but it's worth fixing nonetheless! cc @TK-21st

This PR handles things more elegantly without a try-except block, and also improves handling when assistant messages contain text/image blocks (which is rare right now, but will probably become more common in future)

Fixes #35433

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@Rocketknight1
Copy link
Member Author

cc @Cyrilvallez @ArthurZucker for core maintainer review!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

tokenizers.apply_chat_template with continue_final_message=True with trailing spaces in input
2 participants