Simple pre and post processing strategies for patent searching in CLEF intellectual property track 2009

  • Authors:
  • Julien Gobeill;Emilie Pasche;Douglas Teodoro;Patrick Ruch

  • Affiliations:
  • University of Applied Sciences, Carouge, Switzerland;University and Hospitals of Geneva, Service d'Informatique Médicale, Genève, Switzerland;University and Hospitals of Geneva, Service d'Informatique Médicale, Genève, Switzerland;University of Applied Sciences, Carouge, Switzerland

  • Venue:
  • CLEF'09 Proceedings of the 10th cross-language evaluation forum conference on Multilingual information access evaluation: text retrieval experiments
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

The objective of the 2009 CLEF-IP Track was to find documents that constitute prior art for a given patent. We explored a wide range of simple preprocessing and post-processing strategies, using Mean Average Precision (MAP) for evaluation purposes. Once determined the best document representation, we tuned a classical Information Retrieval engine in order to perform the retrieval step. Finally, we explored two different post-processing strategies. In our experiments, using the complete IPC codes for filtering purposes led to greater improvements than using 4-digits IPC codes. The second postprocessing strategy was to exploit the citations of retrieved patents in order to boost scores of cited patents. Combining all selected strategies, we computed optimal runs that reached a MAP of 0.122 for the training set, and a MAP of 0.129 for the official 2009 CLEF-IP XL set.