Extracting formulaic and free text clinical research articles metadata using conditional random fields

  • Authors:
  • Sein Lin;Jun-Ping Ng;Shreyasee Pradhan;Jatin Shah;Ricardo Pietrobon;Min-Yen Kan

  • Affiliations:
  • National University of Singapore;National University of Singapore;Duke-NUS Graduate Medical School Singapore;Duke-NUS Graduate Medical School Singapore;Duke-NUS Graduate Medical School Singapore;National University of Singapore

  • Venue:
  • Louhi '10 Proceedings of the NAACL HLT 2010 Second Louhi Workshop on Text and Data Mining of Health Documents
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

We explore the use of conditional random fields (CRFs) to automatically extract important metadata from clinical research articles. These metadata fields include formulaic meta-data about the authors, extracted from the title page, as well as free text fields concerning the study's critical parameters, such as longitudinal variables and medical intervention methods, extracted from the body text of the article. Extracting such information can help both readers conduct deep semantic search of articles and policy makers and sociologists track macro level trends in research. Preliminary results show an acceptable level of performance for formulaic metadata and a high precision for those found in the free text.