Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pass@k #519

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open

Pass@k #519

wants to merge 4 commits into from

Conversation

clefourrier
Copy link
Member

No description provided.

strip_strings: bool = False,
sample_scoring_function: Union[Callable[[str, str], float], str] = None,
):
"""Computing pass at k
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made it as exhaustive/customizable as the other metrics (full exact match for the individual predictions by default, options to normalize strings in case you use it for math evals for ex) but I can remove some options if you feel that's too much complexity

self.score_sample = self.default_sample_scoring

def compute(self, golds: list[str], predictions: list[str], **kwargs) -> dict[str, float]:
"""Computes the metric over a list of golds and predictions for one single item with possibly many samples.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Core logic here

return 1 if gold == pred else 0

def pass_at_k(self, all_scores: list[int]) -> float:
"""Algo from https://arxiv.org/pdf/2107.03374"""
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pass at K here, literally the one from codex

@HuggingFaceDocBuilderDev
Copy link
Collaborator

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@clefourrier clefourrier requested a review from NathanHB January 27, 2025 10:01
@hynky1999
Copy link
Collaborator

hynky1999 commented Jan 27, 2025

Wouldn't be the best to make it dynamic ?
E.g you just wrap an existing metric with it, so that it's more flexible ?
Then I can create it like:
metric=pass_at_k(my_existing_metric, k=10) for example

@clefourrier
Copy link
Member Author

You would be able to create a custom metric like so, with a custom sample level metric:

    your_custom_pass_at = SampleLevelMetric(
        metric_name="pass@",
        sample_level_fn=PassAtK(k=10, sample_scoring_function=my_existing_metric).compute,
        category=MetricCategory.GENERATIVE_SAMPLING,
        use_case=MetricUseCase.REASONING,
        corpus_level_fn=np.mean,
        higher_is_better=True,
    )

unless you need stg else?

@hynky1999
Copy link
Collaborator

Ahhh I missed that arg, good by me, thought it was exclusively for string to string comparison

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants