-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tokenizers.apply_chat_template with continue_final_message=True
with trailing spaces in input
#35433
Comments
Hi @chuyishang, this should be fixed now! Please try updating to the latest version, or installing from |
@Rocketknight1 When using multi-round few-shot such as
The rendered chat with
This is because the last round I'm not sure what would be the best way to remedy this. Maybe |
@TK-21st yes, you're completely correct and I hadn't considered that! Working on a high-priority fix now |
System Info
As title says,
tokenizers.apply_chat_template
fails with trailing spaces in input for Llama-3.1-Instruct.If the last
assistant
message has a trailing space, such as{'role': 'assistant', 'content': 'some text '}
and
continue_final_message
is True, it throws a "ValueError: substring not found"This is because in the
apply_chat_template
function, there is a linerendered_chat = rendered_chat[: rendered_chat.rindex(final_message) + len(final_message)].rstrip()
but
rendered_chat
ends with"some text<|eot_id|>"
while thefinal_message
still has the trailing space:"some text "
Who can help?
@ArthurZucker @itazap
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
See above
Expected behavior
I expect it to be able to continue after the trailing space
The text was updated successfully, but these errors were encountered: