Genre and domain in patent texts

  • Authors:
  • Nelleke Oostdijk;Eva D'hondt;Hans van Halteren;Suzan Verberne

  • Affiliations:
  • Radboud University, Nijmegen, Netherlands;Radboud University, Nijmegen, Netherlands;Radboud University, Nijmegen, Netherlands;Radboud University, Nijmegen, Netherlands

  • Venue:
  • PaIR '10 Proceedings of the 3rd international workshop on Patent information retrieval
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper we investigate the variation in language use within the very broad patent domain. We find that language use (represented by syntactic phrases) not only differs from one patent class to the next, but is also a characteristic that sets apart the four sections of a patent (viz. Title, Abstract, Description and Claims). This lends support to the claim that these sections can be viewed as different text genres. For the development of a syntactic parser that is trained on patent texts, we quantify the domain and genre differences in terms of the amounts of text needed to train domain-dependent versions of the parser. Our quantified and exemplified findings on the domain variation in patent data are of interest for the patent retrieval and analysis communities.