The linguistic structure of English web-search queries

Authors:
Cory Barr;Rosie Jones;Moira Regelson
Affiliations:
Yahoo! Inc., Sunnyvale, CA;Yahoo! Inc., Sunnyvale, CA;Perfect Market, Inc., Pasadena, CA
Venue:
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Year:
2008

Citing 10
Cited 26

Transformation-based error-driven learning and natural language processing: a case study in part-of-speech tagging

Computational Linguistics
Phrase recognition and expansion for short, precision-biased queries based on a query log

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Real life, real users, and real needs: a study and analysis of user queries on the web

Information Processing and Management: an International Journal
Using part-of-speech patterns to reduce query ambiguity

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
From E-Sex to E-Commerce: Web Search Changes

Computer
Lexical query paraphrasing for document retrieval

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Feature-rich part-of-speech tagging with a cyclic dependency network

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Enriching the knowledge sources used in a maximum entropy part-of-speech tagger

EMNLP '00 Proceedings of the 2000 Joint SIGDAT conference on Empirical methods in natural language processing and very large corpora: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 13
Weakly-supervised discovery of named entities using web search queries

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Natural language generation for sponsored-search advertisements

Proceedings of the 9th ACM conference on Electronic commerce

Extracting structured information from user queries with semi-supervised conditional random fields

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Semantic tagging of web search queries

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
Exploring web scale language models for search query processing

Proceedings of the 19th international conference on World wide web
Creating robust supervised classifiers via web-scale N-gram data

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Profiting from mark-up: hyper-text annotations for guided parsing

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Understanding the semantic structure of noun phrase queries

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
From frequency to meaning: vector space models of semantics

Journal of Artificial Intelligence Research
Structural annotation of search queries using pseudo-relevance feedback

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Using web-scale N-grams to improve base NP parsing performance

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Managing misspelled queries in IR applications

Information Processing and Management: an International Journal
Joint annotation of search queries

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Piggyback: using search engines for robust cross-domain named entity recognition

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Search in the lost sense of "query": question formulation in web search queries and its temporal changes

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
An analysis of web proxy logs with query distribution pattern approach for search engines

Computer Standards & Interfaces
The role of query sessions in extracting instance attributes from web search queries

ECIR'2010 Proceedings of the 32nd European conference on Advances in Information Retrieval
Transliteration equivalence using canonical correlation analysis

ECIR'2010 Proceedings of the 32nd European conference on Advances in Information Retrieval
Labeling queries for a people search engine

NLDB'12 Proceedings of the 17th international conference on Applications of Natural Language Processing and Information Systems
Instance-driven attachment of semantic annotations over conceptual hierarchies

EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
The impact of task phrasing on the choice of search keywords and on the search process and success

Journal of the American Society for Information Science and Technology
Linguistically-adapted structural query annotation for digital libraries in the social sciences

LaTeCH '12 Proceedings of the 6th Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities
Using search-logs to improve query tagging

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers - Volume 2
Interpreting keyword queries over web knowledge bases

Proceedings of the 21st ACM international conference on Information and knowledge management
Towards optimum query segmentation: in doubt without

Proceedings of the 21st ACM international conference on Information and knowledge management
Role-explicit query identification and intent role annotation

Proceedings of the 21st ACM international conference on Information and knowledge management
Creating a system for lexical substitutions from scratch using crowdsourcing

Language Resources and Evaluation
Unsupervised identification of synonymous query intent templates for attribute intents

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

Web-search queries are known to be short, but little else is known about their structure. In this paper we investigate the applicability of part-of-speech tagging to typical English-language web search-engine queries and the potential value of these tags for improving search results. We begin by identifying a set of part-of-speech tags suitable for search queries and quantifying their occurrence. We find that proper-nouns constitute 40% of query terms, and proper nouns and nouns together constitute over 70% of query terms. We also show that the majority of queries are noun-phrases, not unstructured collections of terms. We then use a set of queries manually labeled with these tags to train a Brill tagger and evaluate its performance. In addition, we investigate classification of search queries into grammatical classes based on the syntax of part-of-speech tag sequences. We also conduct preliminary investigative experiments into the practical applicability of leveraging query-trained part-of-speech taggers for information-retrieval tasks. In particular, we show that part-of-speech information can be a significant feature in machine-learned search-result relevance. These experiments also include the potential use of the tagger in selecting words for omission or substitution in query reformulation, actions which can improve recall. We conclude that training a part-of-speech tagger on labeled corpora of queries significantly outperforms taggers based on traditional corpora, and leveraging the unique linguistic structure of web-search queries can improve search experience.