Building queries for prior-art search

  • Authors:
  • Parvaz Mahdabi;Mostafa Keikha;Shima Gerani;Monica Landoni;Fabio Crestani

  • Affiliations:
  • Faculty of Informatics, University of Lugano, Lugano, Switzerland;Faculty of Informatics, University of Lugano, Lugano, Switzerland;Faculty of Informatics, University of Lugano, Lugano, Switzerland;Faculty of Informatics, University of Lugano, Lugano, Switzerland;Faculty of Informatics, University of Lugano, Lugano, Switzerland

  • Venue:
  • IRFC'11 Proceedings of the Second international conference on Multidisciplinary information retrieval facility
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Prior-art search is a critical step in the examination procedure of a patent application. This study explores automatic query generation from patent documents to facilitate the time-consuming and labor-intensive search for relevant patents. It is essential for this task to identify discriminative terms in different fields of a query patent, which enables us to distinguish relevant patents from non-relevant patents. To this end we investigate the distribution of terms occurring in different fields of the query patent and compare the distributions with the rest of the collection using language modeling estimation techniques. We experiment with term weighting based on the Kullback-Leibler divergence between the query patent and the collection and also with parsimonious language model estimation. Both of these techniques promote words that are common in the query patent and are rare in the collection. We also incorporate the classification assigned to patent documents into our model, to exploit available human judgements in the form of a hierarchical classification. Experimental results show that the retrieval using the generated queries is effective, particularly in terms of recall, while patent description is shown to be the most useful source for extracting query terms.