Maximum Entropy Markov Models for Information Extraction and Segmentation
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
A bottom-up merging algorithm for Chinese unknown word extraction
SIGHAN '03 Proceedings of the second SIGHAN workshop on Chinese language processing - Volume 17
Chinese word segmentation using minimal linguistic knowledge
SIGHAN '03 Proceedings of the second SIGHAN workshop on Chinese language processing - Volume 17
Guessing parts-of-speech of unknown words using global information
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Conditional random fields for activity recognition
Proceedings of the 6th international joint conference on Autonomous agents and multiagent systems
Hybrid methods for POS guessing of Chinese unknown words
ACLstudent '05 Proceedings of the ACL Student Research Workshop
Hi-index | 0.00 |
In this paper, to support more precise Chinese Out-of-Vocabulary (OOV) term detection and Part-of-Speech (POS) guessing, a unified mechanism is proposed and formulated based on the fusion of multiple features and supervised learning. Besides all the traditional features, the new features for statistical information and global contexts are introduced, as well as some constraints and heuristic rules, which reveal the relationships among OOV term candidates. Our experiments on the Chinese corpora from both People's Daily and SIGHAN 2005 have achieved the consistent results, which are better than those acquired by pure rule-based or statistics-based models. From the experimental results for combining our model with Chinese monolingual retrieval on the data sets of TREC-9, it is found that the obvious improvement for the retrieval performance can also be obtained.