Japanese Language Technology
This post is part of a collection on Collections.
The Japanese language has a unique orthography which presents many challenges to technology. Dealing with those challenges, and making computers easier to use with Japanese, is a major focus of my work. The articles here cover both tools I've made for working with Japanese as well as documentation of particular problems, resources, or phenomena.
- A Field Guide to Japanese Mojibake
- Announcing Introduction to Japanese NLP
- Kanji Club: Search Kanji by Parts with Instant Feedback
- Parsing the Infamous Japanese Postal CSV
- How to Tokenize Japanese in Python
- A Short History of Romaji
- cutlet: a Japanese to Romaji Converter in Python
- An Overview of Japanese Tokenizer Dictionaries
- A Spectre is Haunting Unicode
Ψ