https://www.reddit.com/r/textdatamining/
https://en.m.wikipedia.org/wiki/Binary-to-text_encoding#Base58
https://lobste.rs/s/7ttwt8/aho_corasick_string_search
https://blog.floydhub.com/language-translator/ http://jalammar.github.io/illustrated-transformer/
https://explained.ai/decision-tree-viz/index.html
https://www.zverovich.net/2021/06/16/safe-formatting-api.html
https://mewo2.com/notes/markov-history/
https://github.com/apankrat/notes/tree/master/fast-case-conversion
https://github.com/pyjarrett/septum Context-based code search tool, Ada
https://www.cs.utexas.edu/users/moore/best-ideas/string-searching/
https://news.ycombinator.com/item?id=26910982 https://yurichev.com/news/20210421_boyer_moore/ https://news.ycombinator.com/item?id=26900640
https://www.linuxjournal.com/article/6652 How to Index Anything
https://github.com/valeriansaliou/sonic
https://blog.sqlitecloud.io/real-time-full-text-site-search-with-sqlite-fts5-extension
https://neuml.github.io/txtai/workflow/
https://nullprogram.com/blog/2017/10/06/
http://tapiov.net/unicodetiles.js/
https://github.com/qntm/base65536
https://rolisz.com/the-best-text-classification-library-for-a-quick-baseline/
https://devlog.hexops.com/2021/unicode-sorting-why-browsers-added-special-emoji-matching
https://baturin.org/blog/life-before-unicode/ ru
https://zig.news/dude_the_builder/unicode-string-operations-536e
https://heistak.github.io/your-code-displays-japanese-wrong/
https://gregtatum.com/writing/2021/diacritical-marks/
https://blog.unicode.org/2022/09/announcing-unicode-standard-version-150.
https://mcilloni.ovh/2023/07/23/unicode-is-hard/
http://www.figlet.org/fontdb.cgi
https://queue.acm.org/detail.cfm?id=1871406 To move forward with programming languages we need to break free from the tyranny of ASCII.
http://www.network-science.de/ascii/
https://blog.asciinema.org/post/smaller-faster/
https://madned.substack.com/p/ascii-double-murder
https://blogs.oracle.com/mysql/mysql%3a-character-sets%2c-unicode%2c-and-uca-compliant-collations
https://codewords.recurse.com/issues/seven/data-driven-literary-analysis
https://datatracker.ietf.org/doc/draft-faltstrom-base45/
https://kunststube.net/encoding/
https://arankomatsuzaki.wordpress.com/2021/06/04/gpt-j/
https://news.ycombinator.com/item?id=27443528
https://github.com/gregdurrett/berkeley-doc-summarizer
https://medium.com/besedo-engineering/text-summarization-part-2-state-of-the-art-ae900e2ac55f
https://news.ycombinator.com/item?id=36470297
https://norvig.com/spell-correct.html
https://twitter.com/dm_0ney/status/1414742742530498566
https://code.visualstudio.com/blogs/2021/09/29/bracket-pair-colorization
https://www.ctrl.blog/entry/text-wrap-balance.html
https://adi.earth/apps/duplex/
https://news.ycombinator.com/item?id=41797271
https://www.bibtex.com/e/entry-types/
https://eggcorns.lascribe.net/
https://news.ycombinator.com/item?id=40530719
https://news.ycombinator.com/item?id=40254384
https://news.ycombinator.com/item?id=39614816
https://news.ycombinator.com/item?id=38427343
https://www.embopress.org/doi/full/10.15252/msb.202211325
https://learn.microsoft.com/en-us/windows/powertoys/text-extractor
https://saeedesmaili.com/demystifying-text-data-with-the-unstructured-python-library/
https://ionathan.ch/2023/06/06/angarr.html
https://www.oilshell.org/blog/2023/06/surrogate-pair.html
https://thephd.dev/cuneicode-and-the-future-of-text-in-c
https://stephenramsay.net/posts/groff-mom.html
https://www.stefanjudis.com/today-i-learned/how-to-split-javascript-strings-with-intl-segmenter/
https://news.ycombinator.com/item?id=35650699
https://buttondown.email/hillelwayne/archive/tag-systems/
https://blog.adacore.com/introduction-to-vss-library
https://github.com/pop-os/cosmic-text
https://github.com/neuml/paperetl
https://inventlikeanowner.com/blog/the-story-behind-asins-amazon-standard-identification-numbers/
https://rhodesmill.org/brandon/2012/one-sentence-per-line/
https://github.com/christianvoigt/argdown
https://www.openstenoproject.org/plover/ steno
https://www.linode.com/docs/guides/differences-between-grep-sed-awk/
https://lemire.me/blog/2022/12/30/quickly-checking-that-a-string-belongs-to-a-small-set/
https://raphlinus.github.io/text/2020/10/26/text-layout.html
https://en.wikipedia.org/wiki/Overlapping_markup
https://daniel.haxx.se/blog/2022/12/06/faster-base64-in-curl/
https://news.ycombinator.com/item?id=33767301
https://libs.suckless.org/libgrapheme/
https://arxiv.org/abs/2211.05166 Grammatical Error Correction: A Survey of the State of the Art
https://raphlinus.github.io/text/2022/11/08/minikin.html
https://github.com/qntm/base2048 twitter
https://github.com/kohlschutter/boilerpipe
https://en.wikipedia.org/wiki/Cistercian_numerals
https://omniglot.com/conscripts/fakoo.htm
https://blog.unicode.org/2022/09/announcing-icu4x-10.html
https://twitter.com/jonty/status/1571615998335123457
https://github.com/bartp5/libtexprintf
https://lwn.net/Articles/908032/
https://github.com/simdutf/simdutf
https://benhoyt.com/writings/count-words/
https://languagetool.org/en/dev Open-source Grammarly alternative
https://manishearth.github.io/blog/2017/01/14/stop-ascribing-meaning-to-unicode-code-points/
https://www.gnu.org/software/recutils/
https://manishearth.github.io/blog/2017/01/15/breaking-our-latin-1-assumptions/
https://news.ycombinator.com/item?id=31779260 Ask HN: I created a news shortening algorithm and am not sure how to utilize it
https://google-research.github.io/self-organising-systems/2022/diff-fsm/
https://dl.acm.org/doi/pdf/10.1145/3152823 FontCode: Embedding Information in Text Documents Using Glyph Perturbation
https://github.com/birchb1024/frangipanni test2tree
https://lemire.me/blog/2022/04/05/string-representations-are-not-unique-learn-to-normalize/
https://www.mcclimon.org/blog/writing-text-with-flag-emojis/
https://github.com/wolfgarbe/SymSpell Spelling correction & Fuzzy search
https://serhack.me/articles/unveiling-anonymous-author-stylometry-techniques/
https://www.norvig.com/spell-correct.html
https://blog.opensyllabus.org/about-the-open-syllabus-project/
https://github.com/neuml/txtai
https://github.com/larrykollar/Unix-Text-Processing
https://www.revk.uk/2022/02/crlf-has-long-history.html
https://arxiv.org/abs/2202.00848 Some Reflections on Drawing Causal Inference using Textual Data: Parallels Between Human Subjects and Organized Texts
https://drewdevault.com/2022/01/28/Implementing-mime-in-xxxx.html
https://github.com/Uzay-G/espial/blob/main/ARCHITECTURE.md
https://cendyne.dev/posts/2022-01-23-base64.html
https://davidamos.dev/why-cant-you-reverse-a-flag-emoji/
https://www.wired.com/story/kingdom-of-characters-jing-tsu-china-language-information/
https://quickwit.io/blog/quickwit-0.2/
https://blog.adamchalmers.com/nom-chars/
https://newscatcherapi.com/blog/ultimate-guide-to-text-similarity-with-python
http://transcultura.org/?q=node%2F8
https://www.carolemieux.com/arvada_ase21.pdf Learning Highly Recursive Input Grammars
http://defoe.sourceforge.net/folio/knuth-plass.html
https://users.cecs.anu.edu.au/~Peter.Christen/publications/tr-cs-06-02.pdf TR-CS-06-02 A Comparison of Personal Name Matching: Techniques and Practical Issues
https://github.com/minimaxir/big-list-of-naughty-strings
https://web.stanford.edu/~jurafsky/slp3/ Speech and Language Processing