Automatic classification of sentences for evidence based medicine

Authors:
Su Nam Kim;David Martinez;Lawrence Cavedon
Affiliations:
NICTA & University of Melbourne, Melbourne, Australia;NICTA & University of Melbourne, Melbourne, Australia;NICTA & University of Melbourne, Melbourne, Australia
Venue:
DTMBIO '10 Proceedings of the ACM fourth international workshop on Data and text mining in biomedical informatics
Year:
2010

Citing 8
Cited 1

Learning the structure of task-driven human-human dialogs

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
A study on automatically extracted keywords in text categorization

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Bidirectional inference with the easiest-first strategy for tagging sequence data

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Answering Clinical Questions with Knowledge-Based and Statistical Techniques

Computational Linguistics
A study of structured clinical abstracts and the semantic classification of sentences

BioNLP '07 Proceedings of the Workshop on BioNLP 2007: Biological, Translational, and Clinical Language Processing
Towards identifying intervention arms in randomized controlled trials: Extracting coordinating constructions

Journal of Biomedical Informatics
Clinical information retrieval using document and PICO structure

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Tagging and linking web forum posts

CoNLL '10 Proceedings of the Fourteenth Conference on Computational Natural Language Learning

Discourse structure and computation: past, present and future

ACL '12 Proceedings of the ACL-2012 Special Workshop on Rediscovering 50 Years of Discoveries

Quantified Score

Hi-index	0.00

Visualization

Abstract

AIM Given a set of pre-defined medical categories used in Evidence Based Medicine, we aim to automatically annotate sentences in medical abstracts with these labels. METHOD We construct a corpus of 1,000 medical abstracts annotated by hand with medical categories (e.g. "Intervention", "Outcome"). We explore the use of various features based on lexical, semantic, structural, and sequential information in the data, using Conditional Random Fields (CRF) for classification. RESULT For the classification tasks over all labels, our systems achieved micro-averaged F-scores of 80.9% and 66.9% in structured and unstructured datasets respectively, using sequential features. In labeling only key sentences, our systems produced F-scores of 89.3% and 74.0% in structured and unstructured datasets respectively, using the same sequential features. The results over an external dataset were lower (F-scores of 63.1% for all-labels, and 83.8% for key sentences). CONCLUSION Of the features we used, the best for classifying any given sentence in an abstract are based on unigrams, section headings, and sequential information from preceding sentences. These features resulted in improved performance over a simple bag-of-words approach, and outperform feature sets used in previous work.