Extracting and matching authors and affiliations in scholarly documents

Authors:
Huy Hoang Nhat Do;Muthu Kumar Chandrasekaran;Philip S. Cho;Min Yen Kan
Affiliations:
National University of Singapore, Singapore, Singapore;National University of Singapore, Singapore, Singapore;National University of Singapore, Singapore, Singapore;National University of Singapore, Singapore, Singapore
Venue:
Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries
Year:
2013

Citing 13
Cited 1

Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Automatic document metadata extraction using support vector machines

Proceedings of the 3rd ACM/IEEE-CS joint conference on Digital libraries
FLUX-CIM: flexible unsupervised extraction of citation metadata

Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
CEBBIP: a parser of bibliographic information in chinese electronic books

Proceedings of the 9th ACM/IEEE-CS joint conference on Digital libraries
SeerSuite: developing a scalable and reliable application framework for building digital libraries by crawling the web

WebApps'10 Proceedings of the 2010 USENIX conference on Web application development
An index to quantify an individual's scientific research output that takes into account the effect of multiple coauthorship

Scientometrics
LIBSVM: A library for support vector machines

ACM Transactions on Intelligent Systems and Technology (TIST)
Structure extraction from PDF-based book documents

Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries
Affiliation analysis of database publications

ACM SIGMOD Record
An Automatic System for Extracting Figures and Captions in Biomedical PDF Documents

BIBM '11 Proceedings of the 2011 IEEE International Conference on Bioinformatics and Biomedicine
BibPro: A Citation Parser Based on Sequence Alignment

IEEE Transactions on Knowledge and Data Engineering
Web-based citation parsing, correction and augmentation

Proceedings of the 12th ACM/IEEE-CS joint conference on Digital Libraries
Logical Structure Recovery in Scholarly Articles with Rich Document Features

International Journal of Digital Library Systems

Identifying research facilitators in an emerging Asian Research Area

Scientometrics

Quantified Score

Hi-index	0.00

Visualization

Abstract

We introduce Enlil, an information extraction system that discovers the institutional affiliations of authors in scholarly papers. Enlil consists of two steps: one that first identifies authors and affiliations using a conditional random field; and a second support vector machine that connects authors to their affiliations. We benchmark Enlil in three separate experiments drawn from three different sources: the ACL Anthology Corpus, the ACM Digital Library, and a set of cross-disciplinary scientific journal articles acquired by querying Google Scholar. Against a state-of-the-art production baseline, Enlil reports a statistically significant improvement in F_1 of nearly 10% (p 90%) and automatically-acquired input (F_1 80%). We have deployed Enlil in a case study involving Asian genomics research publication patterns to understand how government sponsored collaborative links evolve. Enlil has enabled our team to construct and validate new metrics to quantify the facilitation of research as opposed to direct publication.