Skip to content

Crawler for Cantonese pronunciation data on LSHK Jyutping Word List (香港語言學學會粵拼詞表)

Notifications You must be signed in to change notification settings

freestanding-binary/lshk-word-list-crawler

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

lshk-word-list-crawler

Crawler for Cantonese pronunciation data on LSHK Jyutping Word List (香港語言學學會粵拼詞表)

See sanitized.txt for the final result.

File structure

  • lshk.py: The crawler
  • result.txt: Raw result output by the crawler
  • sanitize.py: Sanitizer for the result
  • sanitized.txt: Final result output by the sanitizer
  • sanitize_log.txt: Sanitize log

License

According to the original terms, the dictionary data is distributed under CC BY 4.0.

Python code in this repository is distributed under MIT license.

Disclaimer

The link of the word list is now broken. If you are interested in a more up-to-date word list, see rime/rime-cantonese.

About

Crawler for Cantonese pronunciation data on LSHK Jyutping Word List (香港語言學學會粵拼詞表)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%