Query segmentation revisited

Authors:
Matthias Hagen;Martin Potthast;Benno Stein;Christof Bräutigam
Affiliations:
Bauhaus-Universität, Weimar, Germany;Bauhaus-Universität, Weimar, Germany;Bauhaus-Universität, Weimar, Germany;Bauhaus-Universität, Weimar, Germany
Venue:
Proceedings of the 20th international conference on World wide web
Year:
2011

Citing 15
Cited 15

Generating query substitutions

Proceedings of the 15th international conference on World Wide Web
A picture of search

InfoScale '06 Proceedings of the 1st international conference on Scalable information systems
Unsupervised query segmentation using generative language models and wikipedia

Proceedings of the 17th international conference on World Wide Web
A unified and discriminative model for query refinement

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Query segmentation using conditional random fields

Proceedings of the First International Workshop on Keyword Search on Structured Data
Two-stage query segmentation for information retrieval

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Query segmentation based on eigenspace similarity

ACLShort '09 Proceedings of the ACL-IJCNLP 2009 Conference Short Papers
Exploring web scale language models for search query processing

Proceedings of the 19th international conference on World wide web
Unsupervised query segmentation using click data: preliminary results

Proceedings of the 19th international conference on World wide web
Crowdsourcing a wikipedia vandalism corpus

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
The power of naive query segmentation

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Structural annotation of search queries using pseudo-relevance feedback

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Query representation and understanding workshop

ACM SIGIR Forum
An evaluation framework for plagiarism detection

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Unsupervised query segmentation using only query logs

Proceedings of the 20th international conference companion on World wide web

Joint annotation of search queries

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
The SemSets model for ad-hoc semantic list search

Proceedings of the 21st international conference on World Wide Web
An IR-based evaluation framework for web search query segmentation

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Detecting candidate named entities in search queries

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
An analysis of free-text queries for a multi-field web form

Proceedings of the 4th Information Interaction in Context Symposium
Towards optimum query segmentation: in doubt without

Proceedings of the 21st ACM international conference on Information and knowledge management
Role-explicit query identification and intent role annotation

Proceedings of the 21st ACM international conference on Information and knowledge management
Extraction and evaluation of candidate named entities in search engine queries

WISE'12 Proceedings of the 13th international conference on Web Information Systems Engineering
Compact query term selection using topically related text

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
An error driven approach to query segmentation

Proceedings of the 22nd international conference on World Wide Web companion
Beyond clicks: query reformulation as a predictor of search satisfaction

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Unsupervised identification of synonymous query intent templates for attribute intents

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
On segmentation of eCommerce queries

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Efficient parsing-based search over structured data

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Mining search and browse logs for web search: A Survey

ACM Transactions on Intelligent Systems and Technology (TIST) - Survey papers, special sections on the semantic adaptive social web, intelligent systems for health informatics, regular papers

Quantified Score

Hi-index	0.00

Visualization

Abstract

We address the problem of query segmentation: given a keyword query, the task is to group the keywords into phrases, if possible. Previous approaches to the problem achieve reasonable segmentation performance but are tested only against a small corpus of manually segmented queries. In addition, many of the previous approaches are fairly intricate as they use expensive features and are difficult to be reimplemented. The main contribution of this paper is a new method for query segmentation that is easy to implement, fast, and that comes with a segmentation accuracy comparable to current state-of-the-art techniques. Our method uses only raw web n-gram frequencies and Wikipedia titles that are stored in a hash table. At the same time, we introduce a new evaluation corpus for query segmentation. With about 50,000 human-annotated queries, it is two orders of magnitude larger than the corpus being used up to now.