Query reformulation mining: models, patterns, and applications

  • Authors:
  • Paolo Boldi;Francesco Bonchi;Carlos Castillo;Sebastiano Vigna

  • Affiliations:
  • DSI, Università degli studi di Milano, Milan, Italy 20135;Yahoo! Research, Barcelona, Spain 080018;Yahoo! Research, Barcelona, Spain 080018;DSI, Università degli studi di Milano, Milan, Italy 20135

  • Venue:
  • Information Retrieval
  • Year:
  • 2011

Quantified Score

Hi-index 0.01

Visualization

Abstract

Understanding query reformulation patterns is a key task towards next generation web search engines. If we can do that, then we can build systems able to understand and possibly predict user intent, providing the needed assistance at the right time, and thus helping users locate information more effectively and improving their web-search experience. As a step in this direction, we build a very accurate model for classifying user query reformulations into broad classes (generalization, specialization, error correction or parallel move), achieving 92% accuracy. We then apply the model to automatically label two very large query logs sampled from different geographic areas, and containing a total of approximately 17 million query reformulations. We study the resulting reformulation patterns, matching some results from previous studies performed on smaller manually annotated datasets, and discovering new interesting reformulation patterns, including connections between reformulation types and topical categories. We annotate two large query-flow graphs with reformulation type information, and run several graph-characterization experiments on these graphs, extracting new insights about the relationships between the different query reformulation types. Finally we study query recommendations based on short random walks on the query-flow graphs. Our experiments show that these methods can match in precision, and often improve, recommendations based on query-click graphs, without the need of users' clicks. Our experiments also show that it is important to consider transition-type labels on edges for having recommendations of good quality.