CiteSeer: an automatic citation indexing system
Proceedings of the third ACM conference on Digital libraries
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Bibliographic attribute extraction from erroneous references based on a statistical model
Proceedings of the 3rd ACM/IEEE-CS joint conference on Digital libraries
Citation Recognition for Scientific Publications in Digital Libraries
DIAL '04 Proceedings of the First International Workshop on Document Image Analysis for Libraries (DIAL'04)
UAI '04 Proceedings of the 20th conference on Uncertainty in artificial intelligence
Information extraction from research papers using conditional random fields
Information Processing and Management: an International Journal
Bibliographic Meta-Data Extraction Using Probabilistic Finite State Transducers
ICDAR '07 Proceedings of the Ninth International Conference on Document Analysis and Recognition - Volume 02
Bibliographic Attributes Extraction with Layer-upon-Layer Tagging
ICDAR '07 Proceedings of the Ninth International Conference on Document Analysis and Recognition - Volume 02
Biomedical named entity recognition using conditional random fields and rich feature sets
JNLPBA '04 Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications
Summarizing key concepts using citation sentences
BioNLP '06 Proceedings of the Workshop on Linking Natural Language Processing and Biology: Towards Deeper Biological Literature Analysis
Evidence-based information extraction for high accuracy citation and author name identification
Large Scale Semantic Access to Content (Text, Image, Video, and Sound)
Hi-index | 0.00 |
Citations are used ubiquitously in biomedical full-text articles and play an important role for representing both the rhetorical structure and the semantic content of the articles. As a result, text mining systems will significantly benefit from a tool that automatically extracts the content of a citation. In this study, we applied the supervised machine-learning algorithms Conditional Random Fields (CRFs) to automatically parse a citation into its fields (e.g., Author, Title, Journal, and Year). With a subset of html format open-access PubMed Central articles, we report an overall 97.95% F1-score. The citation parser can be accessed at: http://www.cs.uwm.edu/~qing/projects/cithit/index.html.