Automatic extraction of function-behaviour-state information from patents

  • Authors:
  • G. Fantoni;R. Apreda;F. Dell'Orletta;M. Monge

  • Affiliations:
  • Department of Mechanical, Nuclear and Production Engineering, University of Pisa, Largo Lucio Lazzarino, 2, 56126 Pisa, Italy;Department of Energy and Systems Engineering, University of Pisa, Largo Lucio Lazzarino, 2, 56126 Pisa, Italy and Erre Quadro s.r.l., via S. Andrea, 59, I-56122 Pisa, Italy;Istituto di Linguistica Computazionale "Antonio Zampolli", ILC-CNR, via G. Moruzzi, 1 Localití S. Cataldo, 56124 Pisa, Italy;Erre Quadro s.r.l., via S. Andrea, 59, I-56122 Pisa, Italy

  • Venue:
  • Advanced Engineering Informatics
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Patents contain a large quantity of technical information not available elsewhere and therefore very interesting for both academia and industry. The purpose of the research is to try to detect and extract information about the functions, the physical behaviours and the states of the system directly from the text of a patent in an automatic way. The above three categories constitute a well-known set of relevant entities in the theory of engineering design, and their study allows powerful analysis of individual artefacts as well as that of groups of products or technologies. The focus is in providing a handy tool that could speed up and facilitate human analysis and allow tackling also large corpora of documents. A second goal is to develop a protocol based on free software and database resources, so that it could be replicable with limited effort by everyone without having to rely on commercial databases. Extracting technical and design information from a document whose aim is more legal than technical, and that is written using a specific jargon, is not a trivial task. The approach chosen to overcome the various issues is to support state-of-the-art Computational Linguistic tools with a large Knowledge Base. The latter has been constructed both manually and automatically and comprises not only keywords but also concepts, relationships and regular expressions. A case study about a very recent patent describing a mechanical device has been included to show the functioning and output of the entire system.