You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For example, passing hypothesis_template = "{:>9999999999}" and any label will happily eat 100Go of RAM because the whole scope of python formatting is allowed.
This is not made clear annywhere, but library users need to know they have to sanitize those inputs very carefully.
Alternatively, this call could accept {} only as a placeholder, it's hard to see a legitimate use case for exotic formatting of labels in the hypothesis template.
Thanks :-)
Motivation
I think it's good to help the internet be a safer place in general :-)
Your contribution
It's unclear to me whether I can contribute to the documentation on hugginface.com.
I could contribute a fix to be stricter on allowed hypothesis_template in transformers though if you want to take this route (I'm pretty sure even an AI model could contribute the two lines needed though...)
The text was updated successfully, but these errors were encountered:
Hi @nicolasdalsass, this is a good point! I don't think the intention here was to allow the full range of Python formatting behaviour, it was just intended as a simple string insertion. If you'd like to make the PR to tighten up security here, I think we'd be happy to review/accept it. It's up to you whether you use an AI or write it yourself 😅
@Rocketknight1 saw this issue and raised a fix in the PR #35886 . Hope you can review this and merge if issue is resolved. I'll acknowledge any comments you have on it as well if required .
It was my pleasure . I've been actively looking through open issues in the transformers library and contributing PRs where I can help. Looking forward to seeing this merged :-)
Feature request
Currently,
ZeroShotClassificationArgumentHandler::__call__
will execute https://github.com/huggingface/transformers/blob/main/src/transformers/pipelines/zero_shot_classification.py#L41 , that is, it will call python.format()
on the hypothesis provided to format the label in it, while allowing the full extent of .format() placeholders syntax, which is quite large.For example, passing
hypothesis_template = "{:>9999999999}"
and any label will happily eat 100Go of RAM because the whole scope of python formatting is allowed.This is not made clear annywhere, but library users need to know they have to sanitize those inputs very carefully.
I think that at least the docstring of the class, and ideally the reference doc for "hypothesis_template" on https://huggingface.co/docs/huggingface_hub/package_reference/inference_client#huggingface_hub.InferenceClient.zero_shot_classification should be updated to mention this, it's quite important for users of the lib (in particular for parameters that will naturally tend to be user facing in the end).
Alternatively, this call could accept {} only as a placeholder, it's hard to see a legitimate use case for exotic formatting of labels in the hypothesis template.
Thanks :-)
Motivation
I think it's good to help the internet be a safer place in general :-)
Your contribution
It's unclear to me whether I can contribute to the documentation on hugginface.com.
I could contribute a fix to be stricter on allowed hypothesis_template in transformers though if you want to take this route (I'm pretty sure even an AI model could contribute the two lines needed though...)
The text was updated successfully, but these errors were encountered: