Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Confidence Calibration #43

Open
adityajain07 opened this issue Sep 12, 2024 · 1 comment
Open

Confidence Calibration #43

adityajain07 opened this issue Sep 12, 2024 · 1 comment
Assignees
Labels
enhancement New feature or request

Comments

@adityajain07
Copy link
Contributor

adityajain07 commented Sep 12, 2024

Using temperature, per-species accuracy and num-samples per species, design a formula to display a calibrated confidence from the softmax function or raw logits.

Background: when models output a prediction of a moth species, we don't know how confident the model is about this prediction. We use one common metric, the softmax score as a proxy for confidence, but this value does not take into account how much the model knows or doesn't know about each species. And the softmax score cannot be used to compare the output of different models. In most applied applications, the softmax score is never displayed. The interface may show "very sure" or "unsure" about a prediction. We need a way to compute a reliable threshold of when the predictions can be trusted, or should be questioned (or rolled up to a higher taxon rank).

@adityajain07 adityajain07 added the enhancement New feature or request label Sep 12, 2024
@adityajain07 adityajain07 self-assigned this Sep 12, 2024
@adityajain07 adityajain07 changed the title A reliable confidence metric Improved confidence metric Sep 12, 2024
@mihow
Copy link
Collaborator

mihow commented Sep 13, 2024

Temperature calibration. Setting all the configuration details aside, it is known that the NNs tend to be “too confident” when predicting the classes [GPSW17, MDR+21]. Here, confidence means the probability of the correctness of the prediction (e.g., softmax probability for the predicted class). In deep learning-related literature, methods have been developed to calibrate the confidence, i.e., going closer to the true probability. One simplest and common calibration factor is called the temperature, T, which “softens” the softmax in a way that T → 1 indicates the estimated output probability is close to its true probability cf. [GPSW17].

I believe this is the paper that introduced this concept of temperature calibration and outlines a practical implementation (2017)
https://arxiv.org/abs/1706.04599
Here is survey of multiple methods (updated in 2024)
https://arxiv.org/abs/2308.01222

@adityajain07 adityajain07 changed the title Improved confidence metric Confidence Calibration Sep 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants