Model Cards

These model cards contain technical details of the models developed and used in PyThaiNLP.

Index

LST20 CLS
Thai NER

LST20 CLS

v0.2

Model Details

Developer : Wannaphong Phatthiyaphaibun
Model date : 2020-10-03
Model version : 0.2
PyThaiNLP version : 2.2.4 +
GitHub : https://github.com/PyThaiNLP/pythainlp/pull/479
CRF Model
License : CC0

Intended Use

Segmenting Thai text into clauses (smaller than a sentence but bigger than a word)
Not suitable for other language or non-news domain.

Factors

Based on known problems with thai natural Language processing.

Metrics

Evaluation metrics include precision, recall and f1-score.

Training Data LST20 Corpus Train set (news domain)

Evaluation Data LST20 Corpus Test set (news domain)

Quantitative Analyses

              precision    recall  f1-score   support

       B_CLS       0.90      0.94      0.92     16111
       E_CLS       0.90      0.94      0.92     15947
       I_CLS       0.99      0.97      0.98    169565

   micro avg       0.97      0.97      0.97    201623
   macro avg       0.93      0.95      0.94    201623
weighted avg       0.97      0.97      0.97    201623
 samples avg       0.94      0.94      0.94    201623

Ethical Considerations no ideas

Caveats and Recommendations

The user must perform word segmentation first before using this model.
Thai text only

^ Back to top

Thai NER

v1.4

Model Details

Developer : Wannaphong Phatthiyaphaibun
Model date : 2020-5-21
Model version : 1.4
PyThaiNLP version : 2.2 +
CRF Model
License : CC0
GitHub for Thai NER 1.4 (Data and train notebook) : https://github.com/wannaphong/thai-ner/tree/master/model/1.4

Intended Use

Named-Entity Tagging for Thai.
Not suitable for other language or non-news domain.

Factors

Based on known problems with thai natural Language processing.

Metrics

Evaluation metrics include precision, recall and f1-score.

Training Data ThaiNER 1.3 Corpus Train set

Evaluation Data ThaiNER 1.3 Corpus Test set

Quantitative Analyses

              precision    recall  f1-score   support
                precision    recall  f1-score   support

        B-DATE       0.92      0.86      0.89       375
        I-DATE       0.94      0.94      0.94       747
       B-EMAIL       1.00      1.00      1.00         5
       I-EMAIL       1.00      1.00      1.00        28
         B-LAW       0.71      0.56      0.62        43
         I-LAW       0.74      0.70      0.72       154
         B-LEN       0.96      0.93      0.95        29
         I-LEN       0.98      0.94      0.96        69
    B-LOCATION       0.88      0.77      0.82       864
    I-LOCATION       0.86      0.73      0.79       852
       B-MONEY       0.98      0.85      0.91       105
       I-MONEY       0.96      0.95      0.95       239
B-ORGANIZATION       0.90      0.78      0.84      1166
I-ORGANIZATION       0.84      0.77      0.81      1338
     B-PERCENT       1.00      0.97      0.99        34
     I-PERCENT       1.00      0.96      0.98        51
      B-PERSON       0.96      0.82      0.88       676
      I-PERSON       0.94      0.92      0.93      2424
       B-PHONE       1.00      0.72      0.84        29
       I-PHONE       0.96      0.92      0.94        78
        B-TIME       0.87      0.73      0.79       172
        I-TIME       0.94      0.83      0.88       336
         B-URL       0.89      1.00      0.94        24
         I-URL       0.96      1.00      0.98       371
         B-ZIP       1.00      1.00      1.00         4

     micro avg       0.91      0.84      0.87     10213
     macro avg       0.93      0.87      0.89     10213
  weighted avg       0.91      0.84      0.87     10213
   samples avg       0.17      0.17      0.17     10213

Ethical Considerations no ideas

Caveats and Recommendations

The user must perform word segmentation first before using this model.
Thai text only

^ Back to top

PyThaiNLP

Provide feedback

Saved searches