An investigation of decompounding for cross-language patent search

  • Authors:
  • Johannes Leveling;Walid Magdy;Gareth J.F. Jones

  • Affiliations:
  • Dublin City University, Dublin, Ireland;Dublin City University, Dublin, Ireland;Dublin City University, Dublin, Ireland

  • Venue:
  • Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Decompounding has been found to improve information retrieval (IR) effectiveness in general domains for languages such as German or Dutch. We investigate if cross-language patent retrieval can profit from decompounding. This poses two challenges: i) There may be few resources such as parallel corpora available for training an machine translation system for a compounding language. ii) Patents have a specific writing style and vocabulary ("patentese"), which may affect the performance of decompounding and translation methods. Experiments on data from the CLEF-IP 2010 task show that decompounding patents for translation can overcome out-of-vocabulary problems (OOV) and that decompounding improves IR performance significantly for small training corpora.