Automated categorization in the international patent classification

  • Authors:
  • C. J. Fall;A. Törcsvári;K. Benzineb;G. Karetka

  • Affiliations:
  • ELCA Informatique SA, Avenue de la Harpe 22-24, CH-1000 Lausanne 13, Switzerland;Arcanum Development, Baranyai utca 10, H-1117 Budapest, Hungary;Metaread SA, 9 rue Boissonnas, CH-1227 Genève-Acacias, Switzerland;World Intellectual Property Organization, 34 Chemin des Colombettes, CH-1211 Genève 20, Switzerland

  • Venue:
  • ACM SIGIR Forum
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

A new reference collection of patent documents for training and testing automated categorization systems is established and described in detail. This collection is tailored for automating the attribution of international patent classification codes to patent applications and is made publicly available for future research work. We report the results of applying a variety of machine learning algorithms to the automated categorization of English-language patent documents. This procedure involves a complex hierarchical taxonomy, within which we classify documents into 114 classes and 451 subclasses. Several measures of categorization success are described and evaluated. We investigate how best to resolve the training problems related to the attribution of multiple classification codes to each patent document.