Parsing citations in biomedical articles using conditional random fields

  • Authors:
  • Qing Zhang;Yong-Gang Cao;Hong Yu

  • Affiliations:
  • University of Wisconsin-Milwaukee, Milwaukee, WI, 53211, USA;University of Wisconsin-Milwaukee, Milwaukee, WI, 53211, USA;University of Wisconsin-Milwaukee, Milwaukee, WI, 53211, USA

  • Venue:
  • Computers in Biology and Medicine
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Citations are used ubiquitously in biomedical full-text articles and play an important role for representing both the rhetorical structure and the semantic content of the articles. As a result, text mining systems will significantly benefit from a tool that automatically extracts the content of a citation. In this study, we applied the supervised machine-learning algorithms Conditional Random Fields (CRFs) to automatically parse a citation into its fields (e.g., Author, Title, Journal, and Year). With a subset of html format open-access PubMed Central articles, we report an overall 97.95% F1-score. The citation parser can be accessed at: http://www.cs.uwm.edu/~qing/projects/cithit/index.html.