A comparison of metadata extraction techniques for crowdsourced bibliographic metadata management

Authors:
Michael Granitzer;Maya Hristakeva;Kris Jack;Robert Knight
Affiliations:
Knowledge Management Institute, Know-Center GmbH, Graz, Austria;Mendeley Ltd., London, UK;Mendeley Ltd., London, UK;Mendeley Ltd., London, UK
Venue:
Proceedings of the 27th Annual ACM Symposium on Applied Computing
Year:
2012

Citing 2
Cited 2

Automatic document metadata extraction using support vector machines

Proceedings of the 3rd ACM/IEEE-CS joint conference on Digital libraries
Rule-based word clustering for document metadata extraction

Proceedings of the 2005 ACM symposium on Applied computing

A comparison of layout based bibliographic metadata extraction techniques

Proceedings of the 2nd International Conference on Web Intelligence, Mining and Semantics
Evaluation of header metadata extraction approaches and tools for scientific PDF documents

Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries

Quantified Score

Hi-index	0.00

Visualization

Abstract

Social research networks such as Mendeley and CiteULike offer various services for collaboratively managing bibliographic metadata and uploading textual artifacts. One core problem thereby is the extraction of bibliographic metadata from the textual artifacts. Our work investiages the use of Conditional Random Fields and Support Vector Machines, implemented in two state-of-the-art real-world systems, namely ParsCit and the Mendeley Desktop, for automatically extracting bibliographic metadata. We compare the systems' accuracy on two newly created real-world data sets gathered from Mendeley and Linked-Open-Data repositories. Our analysis shows that two-stage SVMs provide reasonable performance in solving the challenge of metadata extraction from user-provided textual artifacts.