Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
The probability ranking principle in IR
Readings in information retrieval
A study of smoothing methods for language models applied to Ad Hoc information retrieval
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Biterm language models for document retrieval
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Cumulated gain-based evaluation of IR techniques
ACM Transactions on Information Systems (TOIS)
Self-Supervised Chinese Word Segmentation
IDA '01 Proceedings of the 4th International Conference on Advances in Intelligent Data Analysis
Optimizing search engines using clickthrough data
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Fast statistical parsing of noun phrases for document indexing
ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Dependence language model for information retrieval
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Integrating word relationships into language models
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
A Markov random field model for term dependencies
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Implicit user modeling for personalized search
Proceedings of the 14th ACM international conference on Information and knowledge management
Generating query substitutions
Proceedings of the 15th international conference on World Wide Web
Improving web search ranking by incorporating user behavior information
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Optimisation methods for ranking functions with multiple parameters
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
An exploration of proximity measures in information retrieval
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Unsupervised query segmentation using generative language models and wikipedia
Proceedings of the 17th international conference on World Wide Web
Query segmentation using conditional random fields
Proceedings of the First International Workshop on Keyword Search on Structured Data
Optimizing two stage bigram language models for IR
Proceedings of the 19th international conference on World wide web
The power of naive query segmentation
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
An overview of Microsoft web N-gram corpus and applications
HLT-DEMO '10 Proceedings of the NAACL HLT 2010 Demonstration Session
Boosting web retrieval through query operations
ECIR'05 Proceedings of the 27th European conference on Advances in Information Retrieval Research
Unsupervised extraction of template structure in web search queries
Proceedings of the 21st international conference on World Wide Web
A generalized hidden Markov model with discriminative training for query spelling correction
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
An IR-based evaluation framework for web search query segmentation
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Extending BM25 with multiple query operators
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
An analysis of free-text queries for a multi-field web form
Proceedings of the 4th Information Interaction in Context Symposium
Towards optimum query segmentation: in doubt without
Proceedings of the 21st ACM international conference on Information and knowledge management
Analyzing linguistic structure of web search queries
Proceedings of the 22nd international conference on World Wide Web companion
On segmentation of eCommerce queries
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Mining search and browse logs for web search: A Survey
ACM Transactions on Intelligent Systems and Technology (TIST) - Survey papers, special sections on the semantic adaptive social web, intelligent systems for health informatics, regular papers
Towards Concept-Based Translation Models Using Search Logs for Query Expansion
Proceedings of the 21st ACM international conference on Information and knowledge management
Hi-index | 0.00 |
Query segmentation is an important task toward understanding queries accurately, which is essential for improving search results. Existing segmentation models either use labeled data to predict the segmentation boundaries, for which the training data is expensive to collect, or employ unsupervised strategy based on a large text corpus, which might be inaccurate because of the lack of relevant information. In this paper, we propose a probabilistic model to exploit clickthrough data for query segmentation, where the model parameters are estimated via an efficient EM algorithm. We further study how to properly interpret the segmentation results and utilize them to improve retrieval accuracy. Specifically, we propose an integrated language model based on the standard bigram language model to exploit the probabilistic structure obtained through query segmentation. Experiment results on two datasets show that our segmentation model outperforms existing segmentation models. Furthermore, extensive experiments on a large retrieval dataset reveals that the results of query segmentation can be leveraged to improve retrieval relevance by using the proposed integrated language model.