Extracting formulaic and free text clinical research articles metadata using conditional random fields

Authors:
Sein Lin;Jun-Ping Ng;Shreyasee Pradhan;Jatin Shah;Ricardo Pietrobon;Min-Yen Kan
Affiliations:
National University of Singapore;National University of Singapore;Duke-NUS Graduate Medical School Singapore;Duke-NUS Graduate Medical School Singapore;Duke-NUS Graduate Medical School Singapore;National University of Singapore
Venue:
Louhi '10 Proceedings of the NAACL HLT 2010 Second Louhi Workshop on Text and Data Mining of Health Documents
Year:
2010

Citing 3
Cited 0

Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Automatic document metadata extraction using support vector machines

Proceedings of the 3rd ACM/IEEE-CS joint conference on Digital libraries
Recognizing names in biomedical texts: a machine learning approach

Bioinformatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

We explore the use of conditional random fields (CRFs) to automatically extract important metadata from clinical research articles. These metadata fields include formulaic meta-data about the authors, extracted from the title page, as well as free text fields concerning the study's critical parameters, such as longitudinal variables and medical intervention methods, extracted from the body text of the article. Extracting such information can help both readers conduct deep semantic search of articles and policy makers and sociologists track macro level trends in research. Preliminary results show an acceptable level of performance for formulaic metadata and a high precision for those found in the free text.