Confidence Calibration #43

adityajain07 · 2024-09-12T20:50:04Z

Using temperature, per-species accuracy and num-samples per species, design a formula to display a calibrated confidence from the softmax function or raw logits.

Background: when models output a prediction of a moth species, we don't know how confident the model is about this prediction. We use one common metric, the softmax score as a proxy for confidence, but this value does not take into account how much the model knows or doesn't know about each species. And the softmax score cannot be used to compare the output of different models. In most applied applications, the softmax score is never displayed. The interface may show "very sure" or "unsure" about a prediction. We need a way to compute a reliable threshold of when the predictions can be trusted, or should be questioned (or rolled up to a higher taxon rank).

mihow · 2024-09-13T00:36:54Z

Temperature calibration. Setting all the configuration details aside, it is known that the NNs tend to be “too confident” when predicting the classes [GPSW17, MDR+21]. Here, confidence means the probability of the correctness of the prediction (e.g., softmax probability for the predicted class). In deep learning-related literature, methods have been developed to calibrate the confidence, i.e., going closer to the true probability. One simplest and common calibration factor is called the temperature, T, which “softens” the softmax in a way that T → 1 indicates the estimated output probability is close to its true probability cf. [GPSW17].

I believe this is the paper that introduced this concept of temperature calibration and outlines a practical implementation (2017)
https://arxiv.org/abs/1706.04599
Here is survey of multiple methods (updated in 2024)
https://arxiv.org/abs/2308.01222

adityajain07 added the enhancement New feature or request label Sep 12, 2024

adityajain07 self-assigned this Sep 12, 2024

adityajain07 changed the title ~~A reliable confidence metric~~ Improved confidence metric Sep 12, 2024

adityajain07 changed the title ~~Improved confidence metric~~ Confidence Calibration Sep 13, 2024

mihow modified the milestones: More Models & Backends (ami-ml), ML Pipeline Enhancements (ami-ml) Jan 9, 2025

mihow mentioned this issue Jan 29, 2025

Standard method for rolling up taxa ranks #50

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Confidence Calibration #43

Confidence Calibration #43

adityajain07 commented Sep 12, 2024 •

edited by mihow

Loading

mihow commented Sep 13, 2024

Confidence Calibration #43

Confidence Calibration #43

Comments

adityajain07 commented Sep 12, 2024 • edited by mihow Loading

mihow commented Sep 13, 2024

adityajain07 commented Sep 12, 2024 •

edited by mihow

Loading