Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZeroShotClassificationArgumentHandler should be explicit it has a somewhat unsafe internal behaviour. #35874

Open
nicolasdalsass opened this issue Jan 24, 2025 · 4 comments · May be fixed by #35886
Labels
Feature request Request for a new feature

Comments

@nicolasdalsass
Copy link

nicolasdalsass commented Jan 24, 2025

Feature request

Currently, ZeroShotClassificationArgumentHandler::__call__ will execute https://github.com/huggingface/transformers/blob/main/src/transformers/pipelines/zero_shot_classification.py#L41 , that is, it will call python .format() on the hypothesis provided to format the label in it, while allowing the full extent of .format() placeholders syntax, which is quite large.

For example, passing hypothesis_template = "{:>9999999999}" and any label will happily eat 100Go of RAM because the whole scope of python formatting is allowed.

This is not made clear annywhere, but library users need to know they have to sanitize those inputs very carefully.

I think that at least the docstring of the class, and ideally the reference doc for "hypothesis_template" on https://huggingface.co/docs/huggingface_hub/package_reference/inference_client#huggingface_hub.InferenceClient.zero_shot_classification should be updated to mention this, it's quite important for users of the lib (in particular for parameters that will naturally tend to be user facing in the end).

Alternatively, this call could accept {} only as a placeholder, it's hard to see a legitimate use case for exotic formatting of labels in the hypothesis template.

Thanks :-)

Motivation

I think it's good to help the internet be a safer place in general :-)

Your contribution

It's unclear to me whether I can contribute to the documentation on hugginface.com.

I could contribute a fix to be stricter on allowed hypothesis_template in transformers though if you want to take this route (I'm pretty sure even an AI model could contribute the two lines needed though...)

@nicolasdalsass nicolasdalsass added the Feature request Request for a new feature label Jan 24, 2025
@Rocketknight1
Copy link
Member

Hi @nicolasdalsass, this is a good point! I don't think the intention here was to allow the full range of Python formatting behaviour, it was just intended as a simple string insertion. If you'd like to make the PR to tighten up security here, I think we'd be happy to review/accept it. It's up to you whether you use an AI or write it yourself 😅

@sambhavnoobcoder
Copy link

@Rocketknight1 saw this issue and raised a fix in the PR #35886 . Hope you can review this and merge if issue is resolved. I'll acknowledge any comments you have on it as well if required .

Thank you @nicolasdalsass for raising the issue . 🤗

@nicolasdalsass
Copy link
Author

@sambhavnoobcoder Thanks for the very quick PR on the issue :-) I had a look, only allowing {} seems good and safe to me :-)

@sambhavnoobcoder
Copy link

sambhavnoobcoder commented Jan 25, 2025

It was my pleasure . I've been actively looking through open issues in the transformers library and contributing PRs where I can help. Looking forward to seeing this merged :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature request Request for a new feature
Projects
None yet
3 participants