A hybrid two-stage approach for discipline-independent canonical representation extraction from references

Authors:
Sung Hee Park;Roger W. Ehrich;Edward A. Fox
Affiliations:
Digital Library Research Laboratory, Virginia Tech, Blacksburg, VA, USA;Center for Human Computer Interaction, Virginia Tech, Blacksburg, VA, USA;Digital Library Research Laboratory, Virginia Tech, Blacksburg, VA, USA
Venue:
Proceedings of the 12th ACM/IEEE-CS joint conference on Digital Libraries
Year:
2012

Citing 23
Cited 0

Support-Vector Networks

Machine Learning
Conceptual-model-based data extraction from multiple-record Web pages

Data & Knowledge Engineering
Digital Libraries and Autonomous Citation Indexing

Computer
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Automatic document metadata extraction using support vector machines

Proceedings of the 3rd ACM/IEEE-CS joint conference on Digital libraries
Tag Insertion Complexity

DCC '01 Proceedings of the Data Compression Conference
Automatic text summarization based on the Global Document Annotation

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Information extraction from research papers using conditional random fields

Information Processing and Management: an International Journal
Collective entity resolution in relational data

ACM Transactions on Knowledge Discovery from Data (TKDD)
Semi-supervised conditional random fields for improved sequence segmentation and labeling

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Reference metadata extraction using a hierarchical knowledge representation framework

Decision Support Systems
FLUX-CIM: flexible unsupervised extraction of citation metadata

Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
Simple, robust, scalable semi-supervised learning via expectation regularization

Proceedings of the 24th international conference on Machine learning
Metadata Extraction from Chinese Research Papers Based on Conditional Random Fields

FSKD '07 Proceedings of the Fourth International Conference on Fuzzy Systems and Knowledge Discovery - Volume 01
Genre as noise: noise in genre

International Journal on Document Analysis and Recognition
A simple method for citation metadata extraction using hidden markov models

Proceedings of the 8th ACM/IEEE-CS joint conference on Digital libraries
Learning a two-stage SVM/CRF sequence classifier

Proceedings of the 17th ACM conference on Information and knowledge management
Predicting structured objects with support vector machines

Communications of the ACM - Scratch Programming for All
Joint inference in information extraction

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 1
FireCite: lightweight real-time reference string extraction from webpages

NLPIR4DL '09 Proceedings of the 2009 Workshop on Text and Citation Analysis for Scholarly Digital Libraries
Locating and parsing bibliographic references in HTML medical articles

International Journal on Document Analysis and Recognition - Special Issue DRR09
Machine reading at the University of Washington

FAM-LbR '10 Proceedings of the NAACL HLT 2010 First International Workshop on Formalisms and Methodology for Learning by Reading
Applying weighted PageRank to author citation networks

Journal of the American Society for Information Science and Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

In education and research, references play a key role. However, extracting and parsing references are difficult problems. One concern is that there are many styles of references; hence, given a surface form, identifying what style was employed is problematic, especially in heterogeneous collections of theses and dissertations, which cover many fields and disciplines, and where different styles may be used even in the same publication. We address these problems by drawing upon suitable knowledge found in the WWW. In particular, we research a two-stage classifier approach, involving multi-class classification with respect to reference styles, and partially solve the problem of parsing surface representations of references. We describe empirical evidence for the effectiveness of our approach and plans for improvement of our methods.