Automatic classification of sentences for evidence based medicine

  • Authors:
  • Su Nam Kim;David Martinez;Lawrence Cavedon

  • Affiliations:
  • NICTA & University of Melbourne, Melbourne, Australia;NICTA & University of Melbourne, Melbourne, Australia;NICTA & University of Melbourne, Melbourne, Australia

  • Venue:
  • DTMBIO '10 Proceedings of the ACM fourth international workshop on Data and text mining in biomedical informatics
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

AIM Given a set of pre-defined medical categories used in Evidence Based Medicine, we aim to automatically annotate sentences in medical abstracts with these labels. METHOD We construct a corpus of 1,000 medical abstracts annotated by hand with medical categories (e.g. "Intervention", "Outcome"). We explore the use of various features based on lexical, semantic, structural, and sequential information in the data, using Conditional Random Fields (CRF) for classification. RESULT For the classification tasks over all labels, our systems achieved micro-averaged F-scores of 80.9% and 66.9% in structured and unstructured datasets respectively, using sequential features. In labeling only key sentences, our systems produced F-scores of 89.3% and 74.0% in structured and unstructured datasets respectively, using the same sequential features. The results over an external dataset were lower (F-scores of 63.1% for all-labels, and 83.8% for key sentences). CONCLUSION Of the features we used, the best for classifying any given sentence in an abstract are based on unigrams, section headings, and sequential information from preceding sentences. These features resulted in improved performance over a simple bag-of-words approach, and outperform feature sets used in previous work.