Covering ambiguity resolution in Chinese word segmentation based on contextual information

Authors:
Xiao Luo;Maosong Sun;Benjamin K. Tsou
Affiliations:
Tsinghua University, Beijing, China;Tsinghua University, Beijing, China;City University of Hong Kong, Hong Kong
Venue:
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Year:
2002

Citing 6
Cited 5

Term-weighting approaches in automatic text retrieval

Information Processing and Management: an International Journal
Foundations of statistical natural language processing

Foundations of statistical natural language processing
On the use of words and n-grams for Chinese information retrieval

IRAL '00 Proceedings of the fifth international workshop on on Information retrieval with Asian languages
Introduction to the special issue on word sense disambiguation: the state of the art

Computational Linguistics - Special issue on word sense disambiguation
CSeg& Tag1.0: a practical word segmenter and POS tagger for Chinese texts

ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Word-sense disambiguation using statistical models of Roget's categories trained on large corpora

COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 2

Chinese lexical analysis using hierarchical hidden Markov model

SIGHAN '03 Proceedings of the second SIGHAN workshop on Chinese language processing - Volume 17
Resolving Combinational Ambiguity Based on Ensembles of Classifiers

WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 03
A Unified Character-Based Tagging Framework for Chinese Word Segmentation

ACM Transactions on Asian Language Information Processing (TALIP)
Text disambiguation using support vector machine: an initial study

PRICAI'10 Proceedings of the 11th Pacific Rim international conference on Trends in artificial intelligence
The application of kalman filter based human-computer learning model to chinese word segmentation

CICLing'13 Proceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

Covering ambiguity is one of the two basic types of ambiguities in Chinese word segmentation. We regard its resolution as equivalent to word sense disambiguation, and make use of the classical vector space model in information retrieval to formulate the contexts of ambiguous words. A variation form of TFIDF weighting is proposed and a Chinese thesaurus is additionally utilized to cope with data sparseness problem. We select 90 frequent cases of covering ambiguities as the target. The training set includes 77654 sentences, and the test set includes 19242 sentences. The experimental results showed that our model has achieved 96.58% accuracy, outperforming the original form of TFIDF weighting as well as another baseline model, the hidden Markov model.