Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Additional Metrics for uncertainty evaluation #551

Open
gmartinonQM opened this issue Dec 5, 2024 · 0 comments
Open

Additional Metrics for uncertainty evaluation #551

gmartinonQM opened this issue Dec 5, 2024 · 0 comments
Labels
Needs decision The MAPIE team is deciding what to do next. Other or internal If no other grey tag is relevant or if issue from the MAPIE team

Comments

@gmartinonQM
Copy link
Collaborator

gmartinonQM commented Dec 5, 2024

Hi all, I recently accross this paper : https://arxiv.org/abs/2305.19187

It introduces two interesting metrics about uncertainty evaluation.

The two business questions addressed are :

  1. Is my uncertainty predictive of my errors? It is expected that larger uncertainties correlates with higher error rates
  2. How much errors do I spare if I reject predictions with an uncertainty cut-off? In relation to selective regression/classification/generation, it is expected that my error rate decreases if I delegate high uncertainty cases to humans (or to dustbin).

The corresponding two metrics are quite easy to implement:
1. The AUCROC(y_wrong, y_uncertainty), where y_wrong is 1 if the prediction is wrong, and y_uncertainty is simply the prediction uncertainty. This directly translate the ability of uncertainties to rank wrongest responses (in expectation).
2. The AUARC (Area Under the Accuracy Rejection Curve), which is simply the accuracy score as a function of rejection rate, or uncertainty cut-off.

Beyond the two basic metrics, we could push further the concept to:
• Precision recall curve
• "Mondrianized" metrics, with an additional groups parameters, allowing to stratify the analysis by group
• Extensive utilities to plot diagnostic curves within plotly (as much as sklearn does), with additional information (e.g. the curve of a perfect/random model).

I think this would elegantly complete the existing coverage_scores metrics with metrics closer to business considerations. Moreover, these metrics are almost use case agnostic, since the user can quite easily compute y_wrong as a function of y_true and y_pred, and y_uncertainty as a function of y_pis (e.g. y_uncertainty = y_pis.sum(axis=1) for multiclass classification, which is the length of the prediction set).

Happy to discuss further about this !

@jawadhussein462 jawadhussein462 added Enhancement Other or internal If no other grey tag is relevant or if issue from the MAPIE team Needs decision The MAPIE team is deciding what to do next. labels Dec 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs decision The MAPIE team is deciding what to do next. Other or internal If no other grey tag is relevant or if issue from the MAPIE team
Projects
None yet
Development

No branches or pull requests

3 participants