Self-Supervised Chinese Word Segmentation
IDA '01 Proceedings of the 4th International Conference on Advances in Intelligent Data Analysis
Segmentation Using Eigenvectors: A Unifying View
ICCV '99 Proceedings of the International Conference on Computer Vision-Volume 2 - Volume 2
A compression-based algorithm for Chinese word segmentation
Computational Linguistics
Generating query substitutions
Proceedings of the 15th international conference on World Wide Web
A search-based Chinese word segmentation method
Proceedings of the 16th international conference on World Wide Web
Unsupervised query segmentation using generative language models and wikipedia
Proceedings of the 17th international conference on World Wide Web
Bayesian semi-supervised Chinese word segmentation for statistical machine translation
COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Exploiting internal and external semantics for the clustering of short texts using world knowledge
Proceedings of the 18th ACM conference on Information and knowledge management
The power of naive query segmentation
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Proceedings of the 20th international conference on World wide web
Proceedings of the 5th International Workshop on Web APIs and Service Mashups
An IR-based evaluation framework for web search query segmentation
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Towards optimum query segmentation: in doubt without
Proceedings of the 21st ACM international conference on Information and knowledge management
Hi-index | 0.00 |
Query segmentation is essential to query processing. It aims to tokenize query words into several semantic segments and help the search engine to improve the precision of retrieval. In this paper, we present a novel unsupervised learning approach to query segmentation based on principal eigenspace similarity of query-word-frequency matrix derived from web statistics. Experimental results show that our approach could achieve superior performance of 35.8% and 17.7% in F-measure over the two baselines respectively, i.e. MI (Mutual Information) approach and EM optimization approach.