Preliminary study into query translation for patent retrieval

  • Authors:
  • Charles Jochim;Christina Lioma;Hinrich Schütze;Steffen Koch;Thomas Ertl

  • Affiliations:
  • Universität Stuttgart, Stuttgart, Germany;Universität Stuttgart, Stuttgart, Germany;Universität Stuttgart, Stuttgart, Germany;Universität Stuttgart, Stuttgart, Germany;Universität Stuttgart, Stuttgart, Germany

  • Venue:
  • PaIR '10 Proceedings of the 3rd international workshop on Patent information retrieval
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Patent retrieval is a branch of Information Retrieval (IR) aiming to support patent professionals in retrieving patents that satisfy their information needs. Often, patent granting bodies require patents to be partially translated into one or more major foreign languages, so that language boundaries do not hinder their accessibility. This multilinguality of patent collections offers opportunities for improving patent retrieval. In this work we exploit these opportunities by applying query translation to patent retrieval. We expand monolingual patent queries with their translations, using both a domain-specific patent dictionary that we extract from the patent collection, and a general domain-free dictionary. Experimental evaluation on a standard CLEF-IP dataset shows that using either translation dictionary fetches similar results: query translation can help patent retrieval, but not always, and without great improvement compared to standard statistical monolingual query expansion (Rocchio). The improvement is greater when the source language is English, as opposed to French or German, a finding partly due to the effect of the complex French and German morphology upon translation accuracy, but also partly due to the prevalence of English in the collection. A thorough per-query analysis reveals that cases where standard query expansion fails (e.g. zero recall) can benefit from query translation.