Term-weighting approaches in automatic text retrieval
Information Processing and Management: an International Journal
Foundations of statistical natural language processing
Foundations of statistical natural language processing
On the use of words and n-grams for Chinese information retrieval
IRAL '00 Proceedings of the fifth international workshop on on Information retrieval with Asian languages
Introduction to the special issue on word sense disambiguation: the state of the art
Computational Linguistics - Special issue on word sense disambiguation
CSeg& Tag1.0: a practical word segmenter and POS tagger for Chinese texts
ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Word-sense disambiguation using statistical models of Roget's categories trained on large corpora
COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 2
Chinese lexical analysis using hierarchical hidden Markov model
SIGHAN '03 Proceedings of the second SIGHAN workshop on Chinese language processing - Volume 17
Resolving Combinational Ambiguity Based on Ensembles of Classifiers
WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 03
A Unified Character-Based Tagging Framework for Chinese Word Segmentation
ACM Transactions on Asian Language Information Processing (TALIP)
Text disambiguation using support vector machine: an initial study
PRICAI'10 Proceedings of the 11th Pacific Rim international conference on Trends in artificial intelligence
The application of kalman filter based human-computer learning model to chinese word segmentation
CICLing'13 Proceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part I
Hi-index | 0.00 |
Covering ambiguity is one of the two basic types of ambiguities in Chinese word segmentation. We regard its resolution as equivalent to word sense disambiguation, and make use of the classical vector space model in information retrieval to formulate the contexts of ambiguous words. A variation form of TFIDF weighting is proposed and a Chinese thesaurus is additionally utilized to cope with data sparseness problem. We select 90 frequent cases of covering ambiguities as the target. The training set includes 77654 sentences, and the test set includes 19242 sentences. The experimental results showed that our model has achieved 96.58% accuracy, outperforming the original form of TFIDF weighting as well as another baseline model, the hidden Markov model.