-
Notifications
You must be signed in to change notification settings - Fork 860
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Certain ONNX models ignore the system prompt #1172
Comments
Where/How do you set your system prompt? |
Could you provide more information about the problem you are facing? Is the model producing incorrect results? |
It has no awareness of the system prompt.
Possibly could be due to quantising.
Smollm was quantized with some calibration samples and seems not to have
the issue
…On Sat 8 Feb 2025 at 12:00, Joshua Lochner ***@***.***> wrote:
Could you provide more information about the problem you are facing? Is
the model producing incorrect results?
—
Reply to this email directly, view it on GitHub
<#1172 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ASVG6CXNMBE22AP7GEXTLML2OXWUDAVCNFSM6AAAAABWC4S33OVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMNBVGIZTCMRWHE>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Can you please provide an example of input/output that you are seeing? It may be that the model itself doesn't support a system role (which you can check by looking at the chat template in the tokenizer_config.json file) |
Sure, here is a full repo: https://github.com/TrelisResearch/llama-system-prompt-issue BTW, yes good point on checking the tokeniser. Indeed the system prompt is in there. |
This may just be a limitation of the model itself. Are you able to get good performance with the python library? It may be good to use that as a benchmark for the model's capabilities. |
Yeah the models themselves work fine with transformers and instruction
follow
…On Sun 9 Feb 2025 at 16:18, Joshua Lochner ***@***.***> wrote:
This may just be a limitation of the model itself. Are you able to get
good performance with the python library? It may be good to use that as a
benchmark for the model's capabilities.
—
Reply to this email directly, view it on GitHub
<#1172 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ASVG6CVQ6LOKAX5C3L7LZ7L2O55ULAVCNFSM6AAAAABWC4S33OVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMNBWGM4DCOBQGA>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Hey, folks. I'm seeing the same issues here. On onnxruntime via Python it follows the system prompt, but not on Android's onnxruntime. Both run the same model. My hypothesis is that the attention mask is not being properly set on the Android's version. |
System Info
Here's a model that follows the system prompt:
Here are two that do not:
Is this intentional or accidental?
Environment/Platform
Description
I'm running these models in q4f16 with webgpu
Reproduction
I'm following the examples provided for smollm in the examples, but swapping the model.
The text was updated successfully, but these errors were encountered: