PATATRAS: retrieval model combination and regression models for prior art search

  • Authors:
  • Patrice Lopez;Laurent Romary

  • Affiliations:
  • -;Humboldt Universität zu Berlin, Institut für Deutsche Sprache und Linguistik and INRIA

  • Venue:
  • CLEF'09 Proceedings of the 10th cross-language evaluation forum conference on Multilingual information access evaluation: text retrieval experiments
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents PATATRAS (PATent and Article Tracking, Retrieval and AnalysiS), a system realized at the Humboldt University for the IP track of CLEF 2009. Our approach presents three main characteristics: 1. The usage of multiple retrieval models and term index definitions for the three languages considered in the present track producing ten different sets of ranked results. 2. The merging of the different results based on multiple regression models using an additional training set created from the patent collection. 3. The exploitation of patent metadata and the citation structures for creating restricted initial working sets of patents and for producing a final re-ranking regression model. The resulting architecture allowed us to exploit efficiently specific information of patent documents while remaining generic and easy to extend.