Inductive learning algorithms and representations for text categorization
Proceedings of the seventh international conference on Information and knowledge management
Support Vector Machine Active Learning with Application sto Text Classification
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Covering ambiguity resolution in Chinese word segmentation based on contextual information
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Query preprocessing: improving web search through a Vietnamese word tokenization approach
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Context-Based Approach for Covering Ambiguity Resolution in Chinese Word Segmentation
ICIC '09 Proceedings of the 2009 Second International Conference on Information and Computing Science - Volume 02
ALR7 Proceedings of the 7th Workshop on Asian Language Resources
Hi-index | 0.00 |
Word segmentation is an essential step in building natural language applications such as machine translation, text summarization, and cross-lingual information retrieval. For certain oriental languages where word boundary is not clearly defined, a recognition process can become very challenging. One of the serious problems is dealing with word ambiguity. In this paper, we investigate the use of Linear Support Vector Machines (LSVM) for word boundary disambiguation. We empirically show, in the Vietnamese case, that LSVM obtains a better result when comparing to the Trigram Language Model approach.