Local adaptive extraction of references

Authors:
Peter Klueg;Andreas Hotho;Frank Puppe
Affiliations:
University of Würzburg, Department of Computer Science VI, Würzburg, Germany;University of Würzburg, Department of Computer Science VI, Würzburg, Germany;University of Würzburg, Department of Computer Science VI, Würzburg, Germany
Venue:
KI'10 Proceedings of the 33rd annual German conference on Advances in artificial intelligence
Year:
2010

Citing 6
Cited 0

Transformation-based error-driven learning and natural language processing: a case study in part-of-speech tagging

Computational Linguistics
Automating the Construction of Internet Portals with Machine Learning

Information Retrieval
Reference metadata extraction using a hierarchical knowledge representation framework

Decision Support Systems
FLUX-CIM: flexible unsupervised extraction of citation metadata

Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
Meta-level information extraction

KI'09 Proceedings of the 32nd annual German conference on Advances in artificial intelligence
Transformation-Based information extraction using learned meta-rules

CICLing'05 Proceedings of the 6th international conference on Computational Linguistics and Intelligent Text Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

The accurate extraction of scholarly reference information from scientific publications is essential for many useful applications like BIBTEX management systems or citation analysis. Automatic extraction methods suffer from the heterogeneity of reference notation, no matter wether the extraction model was handcrafted or learnt from labeled data. However, references of the same paper or journal are usually homogeneous. We exploit this local consistency with a novel approach. Given some initial information from such a reference section, we try to derived generalized patterns. These patterns are used to create a local model of the current document. The local model helps to identify errors and to improve the extracted information incrementally during the extraction process. Our approach is implemented with handcrafted transformation rules working on a meta-level being able to correct the information independent of the applied layout style. The experimental results compete very well with the state of the art methods and show an extremely high performance on consistent reference sections.