Natural language understanding (2nd ed.)
Natural language understanding (2nd ed.)
TnT: a statistical part-of-speech tagger
ANLC '00 Proceedings of the sixth conference on Applied natural language processing
A simple rule-based part of speech tagger
ANLC '92 Proceedings of the third conference on Applied natural language processing
Acquiring disambiguation rules from text
ACL '89 Proceedings of the 27th annual meeting on Association for Computational Linguistics
Tagging text with a probabilistic model
ICASSP '91 Proceedings of the Acoustics, Speech, and Signal Processing, 1991. ICASSP-91., 1991 International Conference
PCFG parsing for restricted classical Chinese texts
SIGHAN '02 Proceedings of the first SIGHAN workshop on Chinese language processing - Volume 18
Pseudo context-sensitive models for parsing isolating languages: classical Chinese-a case study
CICLing'03 Proceedings of the 4th international conference on Computational linguistics and intelligent text processing
A classical Chinese corpus with nested part-of-speech tags
LaTeCH '12 Proceedings of the 6th Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities
Hi-index | 0.00 |
Classical Chinese is essentially different from Modern Chinese, in both syntax and morphology. While there has recently been a number of works on part-of-speech (PoS) tagging for Modern Chinese, the PoS tagging for Classical Chinese is largely neglected. To the best of our knowledge, this is the first work in the area. Fortunately however, in terms of tagging, Classical Chinese is easier than Modern Chinese in that most Classical Chinese words are single-character-formed, thus no segmentation is needed. So in this paper, we will propose and analyze a simple statistical approach for PoS tagging of Classical Chinese. We first designed a tagset for Classical Chinese that is later shown to be accurate and efficient. Then we apply the hidden Markov model (HMM) Viterbi algorithm and made several improvements, such as sparse data problem handling and unknown word guessing, both designed particularly for Classical Chinese. As the training set grows larger, the accuracies for bigram and trigram increase to 94.9% and 97.6%, respectively. The contribution of our work also lies in proposing and solving some previously unseen problems in processing Classical Chinese.