Automatic extraction and resolution of bibliographical references in patent documents

Authors:
Patrice Lopez
Affiliations:
-
Venue:
IRFC'10 Proceedings of the First international Information Retrieval Facility conference on Adbances in Multidisciplinary Retrieval
Year:
2010

Citing 7
Cited 1

Finite-State Language Processing

Finite-State Language Processing
Automating the Construction of Internet Portals with Machine Learning

Information Retrieval
Automatic patent classification using citation network information: an experimental study in nanotechnology

Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
Large-scale, parallel automatic patent annotation

Proceedings of the 1st ACM workshop on Patent information retrieval
Automatic extraction of citation information in Japanese patent applications

International Journal on Digital Libraries - Special Issue on Very Large Digital Libraries
Whetting the appetite of scientists: producing summaries tailored to the citation context

Proceedings of the 9th ACM/IEEE-CS joint conference on Digital libraries
GROBID: combining automatic bibliographic data recognition and term extraction for scholarship publications

ECDL'09 Proceedings of the 13th European conference on Research and advanced technology for digital libraries

Scaling up high-value retrieval to medium-volume data

IRFC'10 Proceedings of the First international Information Retrieval Facility conference on Adbances in Multidisciplinary Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes experiments with Conditional Random Fields (CRF) for extracting bibliographical references in patent documents. CRF are used for performing extraction and parsing tasks which are expressed as sequence tagging problems. The automatic recognition covers references to other patent documents and to scholarship publications which are both characterized by a strong variability of contexts and patterns. Our work is not limited to the extraction of reference blocks but also includes fine-grained parsing and the resolution of the bibliographical references based on data normalization and the access to different online bibliographical services. For these different tasks, CRF models surpass significantly existing rule-based algorithms and other machine learning techniques, resulting more particularly in a very high performance for patent reference extractions with a reduction of approx. 75% of the error rate compared to previous works.