Towards optimum query segmentation: in doubt without

Authors:
Matthias Hagen;Martin Potthast;Anna Beyer;Benno Stein
Affiliations:
Bauhaus-Universität Weimar, Weimar, Germany;Bauhaus-Universität Weimar, Weimar, Germany;Bauhaus-Universität Weimar, Weimar, Germany;Bauhaus-Universität Weimar, Weimar, Germany
Venue:
Proceedings of the 21st ACM international conference on Information and knowledge management
Year:
2012

Citing 16
Cited 4

Generating query substitutions

Proceedings of the 15th international conference on World Wide Web
A picture of search

InfoScale '06 Proceedings of the 1st international conference on Scalable information systems
Investigating the querying and browsing behavior of advanced search engine users

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Recognition and classification of noun phrases in queries for effective retrieval

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Unsupervised query segmentation using generative language models and wikipedia

Proceedings of the 17th international conference on World Wide Web
Two-stage query segmentation for information retrieval

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
The linguistic structure of English web-search queries

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Query segmentation based on eigenspace similarity

ACLShort '09 Proceedings of the ACL-IJCNLP 2009 Conference Short Papers
Exploring web scale language models for search query processing

Proceedings of the 19th international conference on World wide web
The power of naive query segmentation

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Structural annotation of search queries using pseudo-relevance feedback

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Unsupervised query segmentation using only query logs

Proceedings of the 20th international conference companion on World wide web
Query segmentation revisited

Proceedings of the 20th international conference on World wide web
Joint annotation of search queries

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Unsupervised query segmentation using clickthrough for information retrieval

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Evaluating the potential of explicit phrases for retrieval quality

ECIR'2010 Proceedings of the 32nd European conference on Advances in Information Retrieval

From keywords to keyqueries: content descriptors for the web

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
From search session detection to search mission detection

Proceedings of the 10th Conference on Open Research Areas in Information Retrieval
On segmentation of eCommerce queries

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Mining search and browse logs for web search: A Survey

ACM Transactions on Intelligent Systems and Technology (TIST) - Survey papers, special sections on the semantic adaptive social web, intelligent systems for health informatics, regular papers

Quantified Score

Hi-index	0.00

Visualization

Abstract

Query segmentation is the problem of identifying those keywords in a query, which together form compound concepts or phrases like "new york times". Such segments can help a search engine to better interpret a user's intents and to tailor the search results more appropriately. Our contributions to this problem are threefold. (1) We conduct the first large-scale study of human segmentation behavior based on more than 500000 segmentations. (2) We show that the traditionally applied segmentation accuracy measures are not appropriate for such large-scale corpora and introduce new, more robust measures. (3) We develop a new query segmentation approach with the basic idea that, in cases of doubt, it is often better to (partially) leave queries without any segmentation. This new in-doubt-without approach chooses different segmentation strategies depending on query types. A large-scale evaluation shows substantial improvement upon the state of the art in terms of segmentation accuracy. To draw a complete picture, we also evaluate the impact of segmentation strategies on retrieval performance in a TREC setting. It turns out that more accurate segmentation not necessarily yields better retrieval performance. Based on this insight, we propose an in-doubt-without variant which achieves the best retrieval performance despite leaving many queries unsegmented. But there is still room for improvement: the optimum segmentation strategy which always chooses the segmentation that maximizes retrieval performance, significantly outperforms all other tested approaches.