Name		Name	Last commit message	Last commit date
parent directory ..
.cargo		.cargo
nlpo3		nlpo3
notebooks		notebooks
src		src
tests		tests
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
build_wheels_local_macos.sh		build_wheels_local_macos.sh
build_wheels_local_manylinux.sh		build_wheels_local_manylinux.sh
pyproject.toml		pyproject.toml
setup.cfg		setup.cfg
setup.py		setup.py

README.md

nlpO3 Python binding

Python binding for nlpO3, a Thai natural language processing library in Rust.

Features

Thai word tokenizer
- segment() - use maximal-matching dictionary-based tokenization algorithm and honor Thai Character Cluster boundaries
  - 2.5x faster than similar pure Python implementation (PyThaiNLP's newmm)
- load_dict() - load a dictionary from plain text file (one word per line)

Dictionary file

For the interest of library size, nlpO3 does not assume what dictionary the developer would like to use. It does not come with a dictionary. A dictionary is needed for the dictionary-based word tokenizer.
For tokenization dictionary, try
- words_th.tx from PyThaiNLP - around 62,000 words (CC0)
- word break dictionary from libthai - consists of dictionaries in different categories, with make script (LGPL-2.1)

Install

pip install nlpo3

Usage

Load file path/to/dict.file to memory and assign a name dict_name to it. Then tokenize a text with the dict_name dictionary:

from nlpo3 import load_dict, segment

load_dict("path/to/dict.file", "custom_dict")
segment("สวัสดีครับ", "dict_name")

it will return a list of strings:

['สวัสดี', 'ครับ']

(result depends on words included in the dictionary)

Use multithread mode, also use the dict_name dictionary:

segment("สวัสดีครับ", dict_name="dict_name", parallel=True)

Use safe mode to avoid long waiting time in some edge cases for text with lots of ambiguous word boundaries:

segment("สวัสดีครับ", dict_name="dict_name", safe=True)

Build

Requirements

Rust 2018 Edition
Python 3.6 or newer
Python Development Headers
- Ubuntu: sudo apt-get install python3-dev
- macOS: No action needed
PyO3 - already included in Cargo.toml
setuptools-rust

Steps

python -m pip install --upgrade build
python -m build

This should generate a wheel file, in dist/ directory, which can be installed by pip.

Issues

Please report issues at https://github.com/PyThaiNLP/nlpo3/issues

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nlpo3-python

nlpo3-python

README.md

nlpO3 Python binding

Features

Dictionary file

Install

Usage

Build

Requirements

Steps

Issues

Files

nlpo3-python

Directory actions

More options

Directory actions

More options

Latest commit

History

nlpo3-python

Folders and files

parent directory

README.md

nlpO3 Python binding

Features

Dictionary file

Install

Usage

Build

Requirements

Steps

Issues