-
Notifications
You must be signed in to change notification settings - Fork 274
Model Cards
Pattarawat Chormai edited this page Oct 31, 2020
·
19 revisions
These model cards contain technical details of the models developed and used in PyThaiNLP.
Model Details
- Developer : Wannaphong Phatthiyaphaibun
- Model date : 2020-10-03
- Model version : 0.2
- PyThaiNLP version : 2.2.4 +
- GitHub : https://github.com/PyThaiNLP/pythainlp/pull/479
- CRF Model
- License : CC0
Intended Use
- Segmenting Thai text into clauses (smaller than a sentence but bigger than a word)
- Not suitable for other language or non-news domain.
Factors
- Based on known problems with thai natural Language processing.
Metrics
- Evaluation metrics include precision, recall and f1-score.
Training Data LST20 Corpus Train set (news domain)
Evaluation Data LST20 Corpus Test set (news domain)
Quantitative Analyses
precision recall f1-score support
B_CLS 0.90 0.94 0.92 16111
E_CLS 0.90 0.94 0.92 15947
I_CLS 0.99 0.97 0.98 169565
micro avg 0.97 0.97 0.97 201623
macro avg 0.93 0.95 0.94 201623
weighted avg 0.97 0.97 0.97 201623
samples avg 0.94 0.94 0.94 201623
Ethical Considerations no ideas
Caveats and Recommendations
- The user must perform word segmentation first before using this model.
- Thai text only
Model Details
- Developer : Wannaphong Phatthiyaphaibun
- Model date : 2020-5-21
- Model version : 1.4
- PyThaiNLP version : 2.2 +
- CRF Model
- License : CC0
- GitHub for Thai NER 1.4 (Data and train notebook) : https://github.com/wannaphong/thai-ner/tree/master/model/1.4
Intended Use
- Named-Entity Tagging for Thai.
- Not suitable for other language or non-news domain.
Factors
- Based on known problems with thai natural Language processing.
Metrics
- Evaluation metrics include precision, recall and f1-score.
Training Data ThaiNER 1.3 Corpus Train set
Evaluation Data ThaiNER 1.3 Corpus Test set
Quantitative Analyses
precision recall f1-score support
precision recall f1-score support
B-DATE 0.92 0.86 0.89 375
I-DATE 0.94 0.94 0.94 747
B-EMAIL 1.00 1.00 1.00 5
I-EMAIL 1.00 1.00 1.00 28
B-LAW 0.71 0.56 0.62 43
I-LAW 0.74 0.70 0.72 154
B-LEN 0.96 0.93 0.95 29
I-LEN 0.98 0.94 0.96 69
B-LOCATION 0.88 0.77 0.82 864
I-LOCATION 0.86 0.73 0.79 852
B-MONEY 0.98 0.85 0.91 105
I-MONEY 0.96 0.95 0.95 239
B-ORGANIZATION 0.90 0.78 0.84 1166
I-ORGANIZATION 0.84 0.77 0.81 1338
B-PERCENT 1.00 0.97 0.99 34
I-PERCENT 1.00 0.96 0.98 51
B-PERSON 0.96 0.82 0.88 676
I-PERSON 0.94 0.92 0.93 2424
B-PHONE 1.00 0.72 0.84 29
I-PHONE 0.96 0.92 0.94 78
B-TIME 0.87 0.73 0.79 172
I-TIME 0.94 0.83 0.88 336
B-URL 0.89 1.00 0.94 24
I-URL 0.96 1.00 0.98 371
B-ZIP 1.00 1.00 1.00 4
micro avg 0.91 0.84 0.87 10213
macro avg 0.93 0.87 0.89 10213
weighted avg 0.91 0.84 0.87 10213
samples avg 0.17 0.17 0.17 10213
Ethical Considerations no ideas
Caveats and Recommendations
- The user must perform word segmentation first before using this model.
- Thai text only
PyThaiNLP