Large-scale, parallel automatic patent annotation

Authors:
Milan Agatonovic;Niraj Aswani;Kalina Bontcheva;Hamish Cunningham;Thomas Heitz;Yaoyong Li;Ian Roberts;Valentin Tablan
Affiliations:
University of Sheffield, Sheffield, United Kngdm;University of Sheffield, Sheffield, United Kngdm;University of Sheffield, Sheffield, United Kngdm;University of Sheffield, Sheffield, United Kngdm;University of Sheffield, Sheffield, United Kngdm;University of Sheffield, Sheffield, United Kngdm;University of Sheffield, Sheffield, United Kngdm;University of Sheffield, Sheffield, United Kngdm
Venue:
Proceedings of the 1st ACM workshop on Patent information retrieval
Year:
2008

Citing 5
Cited 5

An Algorithm that Learns What‘s in a Name

Machine Learning - Special issue on natural language learning
Information Retrieval

Information Retrieval
KIM – a semantic platform for information extraction and retrieval

Natural Language Engineering
MUC-4 evaluation metrics

MUC4 '92 Proceedings of the 4th conference on Message understanding
SVM based learning system for information extraction

Proceedings of the First international conference on Deterministic and Statistical Methods in Machine Learning

A design rationale representation model using patent documents

Proceedings of the 2nd international workshop on Patent information retrieval
Converting and annotating quantitative data tables

ISWC'10 Proceedings of the 9th international semantic web conference on The semantic web - Volume Part I
Automatic extraction and resolution of bibliographical references in patent documents

IRFC'10 Proceedings of the First international Information Retrieval Facility conference on Adbances in Multidisciplinary Retrieval
The patents retrieval prototype in the MOLTO project

Proceedings of the 21st international conference companion on World Wide Web
GATE Teamware: a web-based, collaborative text annotation framework

Language Resources and Evaluation

Quantified Score

Hi-index	0.00

Visualization

Abstract

When researching new product ideas or filing new patents, inventors need to retrieve all relevant pre-existing know-how and/or to exploit and enforce patents in their technological domain. However, this process is hindered by lack of richer metadata, which if present, would allow more powerful concept-based search to complement the current keyword-based approach. This paper presents our approach to automatic patent enrichment, tested in large-scale, parallel experiments on USPTO and EPO documents. It starts by defining the metadata annotation task and examines its challenges. The text analysis tools are presented next, including details on automatic annotation of sections, references and measurements. The key challenges encountered were dealing with ambiguities and errors in the data; creation and maintenance of large, domain-independent dictionaries; and building an efficient, robust patent analysis pipeline, capable of dealing with terabytes of data. The accuracy of automatically created metadata is evaluated against a human-annotated gold standard, with results of over 90% on most annotation types.