Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Certain ONNX models ignore the system prompt #1172

Open
1 of 5 tasks
RonanKMcGovern opened this issue Jan 29, 2025 · 8 comments
Open
1 of 5 tasks

Certain ONNX models ignore the system prompt #1172

RonanKMcGovern opened this issue Jan 29, 2025 · 8 comments
Labels
bug Something isn't working

Comments

@RonanKMcGovern
Copy link

System Info

Here's a model that follows the system prompt:

  • HuggingFaceTB/SmolLM2-1.7B-Instruct

Here are two that do not:

  • onnx-community/Llama-3.2-3B-Instruct-onnx-web-gqa
  • onnx-community/Qwen2.5-Coder-1.5B-Instruct

Is this intentional or accidental?

Environment/Platform

  • Website/web-app
  • Browser extension
  • Server-side (e.g., Node.js, Deno, Bun)
  • Desktop app (e.g., Electron)
  • Other (e.g., VSCode extension)

Description

I'm running these models in q4f16 with webgpu

Reproduction

I'm following the examples provided for smollm in the examples, but swapping the model.

@RonanKMcGovern RonanKMcGovern added the bug Something isn't working label Jan 29, 2025
@josedandrade
Copy link

System Info

Here's a model that follows the system prompt:

  • HuggingFaceTB/SmolLM2-1.7B-Instruct

Here are two that do not:

  • onnx-community/Llama-3.2-3B-Instruct-onnx-web-gqa
  • onnx-community/Qwen2.5-Coder-1.5B-Instruct

Is this intentional or accidental?

Environment/Platform

  • Website/web-app[ ] Browser extension[ ] Server-side (e.g., Node.js, Deno, Bun)[ ] Desktop app (e.g., Electron)[ ] Other (e.g., VSCode extension)

Description

I'm running these models in q4f16 with webgpu

Reproduction

I'm following the examples provided for smollm in the examples, but swapping the model.

Where/How do you set your system prompt?

@xenova
Copy link
Collaborator

xenova commented Feb 8, 2025

Could you provide more information about the problem you are facing? Is the model producing incorrect results?

@RonanKMcGovern
Copy link
Author

RonanKMcGovern commented Feb 8, 2025 via email

@xenova
Copy link
Collaborator

xenova commented Feb 8, 2025

Can you please provide an example of input/output that you are seeing? It may be that the model itself doesn't support a system role (which you can check by looking at the chat template in the tokenizer_config.json file)

@RonanKMcGovern
Copy link
Author

Sure, here is a full repo: https://github.com/TrelisResearch/llama-system-prompt-issue

BTW, yes good point on checking the tokeniser. Indeed the system prompt is in there.

@xenova
Copy link
Collaborator

xenova commented Feb 9, 2025

This may just be a limitation of the model itself. Are you able to get good performance with the python library? It may be good to use that as a benchmark for the model's capabilities.

@RonanKMcGovern
Copy link
Author

RonanKMcGovern commented Feb 10, 2025 via email

@drlima
Copy link

drlima commented Feb 20, 2025

Hey, folks.

I'm seeing the same issues here.

On onnxruntime via Python it follows the system prompt, but not on Android's onnxruntime. Both run the same model.

My hypothesis is that the attention mask is not being properly set on the Android's version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants