Statistically-enhanced new word identification in a rule-based Chinese system
CLPW '00 Proceedings of the second workshop on Chinese language processing: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 12
Combining classifiers for Chinese word segmentation
SIGHAN '02 Proceedings of the first SIGHAN workshop on Chinese language processing - Volume 18
Automatic recognition of Chinese unknown words based on roles tagging
SIGHAN '02 Proceedings of the first SIGHAN workshop on Chinese language processing - Volume 18
The first international Chinese word segmentation Bakeoff
SIGHAN '03 Proceedings of the second SIGHAN workshop on Chinese language processing - Volume 17
Chinese named entity recognition using lexicalized HMMs
ACM SIGKDD Explorations Newsletter - Natural language processing and text mining
Computational Linguistics
Automatic expansion of abbreviations in chinese news text
AIRS'06 Proceedings of the Third Asia conference on Information Retrieval Technology
A unified framework for text analysis in chinese TTS
ISCSLP'06 Proceedings of the 5th international conference on Chinese Spoken Language Processing
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Hi-index | 0.00 |
This paper presents a modified class-based LM approach to Chinese unknown word identification. In this work, Chinese unknown word identification is viewed as a classification problem and the part-of-speech of each unknown word is defined as its class. Furthermore, three types of features, including contextual class feature, word juncture model and word formation patterns, are combined in a framework of class-based LM to perform correct unknown word identification on a sequence of known words. In addition to unknown word identification, the class-based LM approach also provides a solution for unknown word tagging. The results of our experiments show that most unknown words in Chinese texts can be resolved effectively by the proposed approach.