Understanding the specificity of web search queries

Authors:
Carolyn Theresa Hafernik;Bernard J. Jansen
Affiliations:
The Pennsylvania State University, University Park, PA, USA;The Pennsylvania State University, University Park, Pennsylvania, USA
Venue:
CHI '13 Extended Abstracts on Human Factors in Computing Systems
Year:
2013

Citing 7
Cited 0

Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
Feature-rich part-of-speech tagging with a cyclic dependency network

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Enriching the knowledge sources used in a maximum entropy part-of-speech tagger

EMNLP '00 Proceedings of the 2000 Joint SIGDAT conference on Empirical methods in natural language processing and very large corpora: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 13
Understanding the relationship of information need specificity to search query length

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Ambiguous queries: test collections need more sense

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Identification of ambiguous queries in web search

Information Processing and Management: an International Journal
Analyzing URL queries

Journal of the American Society for Information Science and Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

Understanding the specificity of Web search queries can help search systems better address the underlying needs of searchers and provide them relevant content. The goal of this work is to automatically determine the specificity of web search queries. Although many factors may impact the specificity of Web search queries, we investigate two factors of specificity in this research, (1) part of speech and (2) query length. We use content analysis and prior research to develop a list of nine attributes to identify query specificity. The attributes are whether a query contains a URL, a location or place name along with additional terms, compares multiple things, contains multiple distinct ideas or topics, a question that has a clear answer, request for directions, instructions or tips, a specific date and additional terms or a name and additional terms. We then apply these attributes to classify 5,115 unique queries as narrow or general. We then analyze the differences between narrow and general queries based on part of speech and query length. Our results indicate that query length and parts-of-speech usage, by themselves, can distinguish narrow and general queries. We discuss the implications of this work for search engines, marketers and users.