Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Automatic document metadata extraction using support vector machines
Proceedings of the 3rd ACM/IEEE-CS joint conference on Digital libraries
FLUX-CIM: flexible unsupervised extraction of citation metadata
Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
CEBBIP: a parser of bibliographic information in chinese electronic books
Proceedings of the 9th ACM/IEEE-CS joint conference on Digital libraries
WebApps'10 Proceedings of the 2010 USENIX conference on Web application development
LIBSVM: A library for support vector machines
ACM Transactions on Intelligent Systems and Technology (TIST)
Structure extraction from PDF-based book documents
Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries
Affiliation analysis of database publications
ACM SIGMOD Record
An Automatic System for Extracting Figures and Captions in Biomedical PDF Documents
BIBM '11 Proceedings of the 2011 IEEE International Conference on Bioinformatics and Biomedicine
BibPro: A Citation Parser Based on Sequence Alignment
IEEE Transactions on Knowledge and Data Engineering
Web-based citation parsing, correction and augmentation
Proceedings of the 12th ACM/IEEE-CS joint conference on Digital Libraries
Logical Structure Recovery in Scholarly Articles with Rich Document Features
International Journal of Digital Library Systems
Hi-index | 0.00 |
We introduce Enlil, an information extraction system that discovers the institutional affiliations of authors in scholarly papers. Enlil consists of two steps: one that first identifies authors and affiliations using a conditional random field; and a second support vector machine that connects authors to their affiliations. We benchmark Enlil in three separate experiments drawn from three different sources: the ACL Anthology Corpus, the ACM Digital Library, and a set of cross-disciplinary scientific journal articles acquired by querying Google Scholar. Against a state-of-the-art production baseline, Enlil reports a statistically significant improvement in F_1 of nearly 10% (p 90%) and automatically-acquired input (F_1 80%). We have deployed Enlil in a case study involving Asian genomics research publication patterns to understand how government sponsored collaborative links evolve. Enlil has enabled our team to construct and validate new metrics to quantify the facilitation of research as opposed to direct publication.