Automatic suggestion of query-rewrite rules for enterprise search

  • Authors:
  • Zhuowei Bao;Benny Kimelfeld;Yunyao Li

  • Affiliations:
  • University of Pennsylvania, Philadelphia, PA, USA;IBM Research - Almaden, San Jose, CA, USA;IBM Research - Almaden, San Jose, CA, USA

  • Venue:
  • SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Enterprise search is challenging for several reasons, notably the dynamic terminology and jargon that are specific to the enterprise domain. This challenge is partly addressed by having domain experts maintaining the enterprise search engine and adapting it to the domain specifics. Those administrators commonly address user complaints about relevant documents missing from the top matches. For that, it has been proposed to allow administrators to influence search results by crafting query-rewrite rules, each specifying how queries of a certain pattern should be modified or augmented with additional queries. Upon a complaint, the administrator seeks a semantically coherent rule that is capable of pushing the desired documents up to the top matches. However, the creation and maintenance of rewrite rules is highly tedious and time consuming. Our goal in this work is to ease the burden on search administrators by automatically suggesting rewrite rules. This automation entails several challenges. One major challenge is to select, among many options, rules that are ``natural'' from a semantic perspective (e.g., corresponding to closely related and syntactically complete concepts). Towards that, we study a machine-learning classification approach. The second challenge is to accommodate the cross-query effect of rules---a rule introduced in the context of one query can eliminate the desired results for other queries and the desired effects of other rules. We present a formalization of this challenge as a generic computational problem. As we show that this problem is highly intractable in terms of complexity theory, we present heuristic approaches and optimization thereof. In an experimental study within IBM intranet search, those heuristics achieve near-optimal quality and well scale to large data sets.