You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It would be great if the scorer could be called via a python API. At time of writing, the scorer can only be fed with HIPE-compliant tsv-files. This is a limitation for two reasons :
It makes it complicated to evaluate on the fly (e.g. at the end of each epoch).
It makes it necessary to rebuild words out of each model's tokens, which can be subtokens.
This second point can be very problematic, depending on the your labelling strategy. Before sub-tokenizing, an input example may look like :
O B-PERS I-PERS I-PERS
The Australian Prime minister
A model like BERT could tokenize and label this example like so :
O B-PERS B-PERS I-PERS I-PERS
The Austral ##ian Prime minister
However, at inference time, the model may predict something like :
O B-PERS I-PERS I-PERS I-PERS
The Austral ##ian Prime minister
To evaluate this prediction, you must first rebuild the words to match the ground-truth tsv. However, since austral and ##ian have to different labels, it is not clear which should be chosen.
If there was a possibility to feed the scorer with two simple list objects (prediction and ground-truth, in a seqeval like fashion), things would be easier.
Though the aforementioned problem could be circumvented by labelling only the first sub-token, it would still be great to evaluate predictions on the fly, and even to have the API directly accessible via external frameworks such as HuggingFace.
The text was updated successfully, but these errors were encountered:
sven-nm
changed the title
[Feature request] : Make the scorer's directly accessible via a python API
[Feature request] : Make the scorer directly accessible via a python API
Mar 4, 2022
It would be great if the scorer could be called via a python API. At time of writing, the scorer can only be fed with HIPE-compliant tsv-files. This is a limitation for two reasons :
This second point can be very problematic, depending on the your labelling strategy. Before sub-tokenizing, an input example may look like :
A model like BERT could tokenize and label this example like so :
However, at inference time, the model may predict something like :
To evaluate this prediction, you must first rebuild the words to match the ground-truth tsv. However, since
austral
and##ian
have to different labels, it is not clear which should be chosen.If there was a possibility to feed the scorer with two simple
list
objects (prediction and ground-truth, in a seqeval like fashion), things would be easier.Though the aforementioned problem could be circumvented by labelling only the first sub-token, it would still be great to evaluate predictions on the fly, and even to have the API directly accessible via external frameworks such as HuggingFace.
The text was updated successfully, but these errors were encountered: