Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Statistically-enhanced new word identification in a rule-based Chinese system
CLPW '00 Proceedings of the second workshop on Chinese language processing: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 12
Guessing parts-of-speech of unknown words using global information
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Conditional random fields for activity recognition
Proceedings of the 6th international joint conference on Autonomous agents and multiagent systems
Hybrid methods for POS guessing of Chinese unknown words
ACLstudent '05 Proceedings of the ACL Student Research Workshop
A multi-domain web-based algorithm for POS tagging of unknown words
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Combining contextual and structural information for supersense tagging of chinese unknown words
CICLing'11 Proceedings of the 12th international conference on Computational linguistics and intelligent text processing - Volume Part I
Extraction of onomatopoeia used for foods from food reviews and its application to restaurant search
Proceedings of the 21st international conference companion on World Wide Web
Hi-index | 0.00 |
This paper proposes a method for automatic POS (part-of-speech) guessing of Chinese unknown words. It contains two models. The first model uses a machine-learning method to predict the POS of unknown words based on their internal component features. The credibility of the results of the first model is then measured. For low-credibility words, the second model is used to revise the first model's results based on the global context information of those words. The experiments show that the first model achieves 93.40% precision for all words and 86.60% for disyllabic words, which is a significant improvement over the best results reported in previous studies, which were 89% precision for all words and 74% for disyllabic words. Further, the second model improves the results by 0.80% precision for all words and 1.30% for disyllabic words.