Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Covering ambiguity resolution in Chinese word segmentation based on contextual information
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
A collocation-based WSD model: RFR-SUM
IEA/AIE'07 Proceedings of the 20th international conference on Industrial, engineering, and other applications of applied intelligent systems
Hi-index | 0.01 |
Ambiguity processing is an important factor affecting the accuracy of word segmentation, of which combinational ambiguity is one of the vital issues. In this paper, we adopt methods of machine learning, choose the appropriate characteristic, and use the highly efficient classifying models of RFR_SUM, CRF, NaiveBayes, KNN, and RBF to resolve combinational ambiguity. Four combining strategies of ensembles of classifiers - product, average, max, majority voting - are applied in our experiment. 20 typical combinationally ambiguous words are tested by using a half year corpus of the 1998 "People's Daily", and the best average F-score achieved was 98.02%. The result shows that the methods of ensemble, which make full use of various contextual information such as word, frequency, part-of-speech and so on, can effectively improve disambiguation accuracy