Automatic refinement of patent queries using concept importance predictors

  • Authors:
  • Parvaz Mahdabi;Linda Andersson;Mostafa Keikha;Fabio Crestani

  • Affiliations:
  • University of Lugano, Lugano, Switzerland;Vienna University of Technology, Vienna, Austria;University of Lugano, Lugano, Switzerland;University of Lugano, Lugano, Switzerland

  • Venue:
  • SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Patent prior art queries are full patent applications which are much longer than standard web search topics. Such queries are composed of hundreds of terms and do not represent a focused information need. One way to make the queries more focused is to select a group of key terms as representatives. Existing works show that such a selection to reduce patent queries is a challenging task mainly because of the presence of ambiguous terms. Given this setup, we present a query modeling approach where we utilize patent-specific characteristics to generate more precise queries. We propose to automatically disambiguate query terms by employing noun phrases that are extracted using the global analysis of the patent collection. We further introduce a method for predicting whether expansion using noun phrases would improve the retrieval effectiveness. Our experiments show that we can obtain almost 20% improvement by performing query expansion using the true importance of the noun phrase queries. Based on this observation, we introduce various features that can be used to estimate the importance of the noun phrase query. We evaluated the effectiveness of the proposed method on the patent prior art search collection CLEF-IP 2010. Our experimental results indicate that the proposed features make good predictors of the noun phrase importance, and selective application of noun phrase queries using the importance predictors outperforms existing query generation methods.