Automatically classifying sentences in full-text biomedical articles into Introduction, Methods, Results and Discussion

Authors:
Shashank Agarwal;Hong Yu
Affiliations:
-;-
Venue:
Bioinformatics
Year:
2009

Citing 0
Cited 9

Detecting hedge cues and their scope in biomedical text with conditional random fields

Journal of Biomedical Informatics
Consolidating multiple requirement specifications through argumentation

Proceedings of the 2011 ACM Symposium on Applied Computing
The CISP annotation schema uncovers hypotheses and explanations in full-text scientific journal articles

BioNLP '11 Proceedings of BioNLP 2011 Workshop
Dynamic categorization of clinical research eligibility criteria by hierarchical clustering

Journal of Biomedical Informatics
Toward automated consumer question answering: Automatically separating consumer questions from professional questions in the healthcare domain

Journal of Biomedical Informatics
Discourse structure and computation: past, present and future

ACL '12 Proceedings of the ACL-2012 Special Workshop on Rediscovering 50 Years of Discoveries
A statistical relational learning approach to identifying evidence based medicine categories

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Discourse structure and language technology

Natural Language Engineering
Algorithms for decision problems in argument systems under preferred semantics

Artificial Intelligence

Quantified Score

Hi-index	3.84

Visualization

Abstract

Biomedical texts can be typically represented by four rhetorical categories: Introduction, Methods, Results and Discussion (IMRAD). Classifying sentences into these categories can benefit many other text-mining tasks. Although many studies have applied different approaches for automatically classifying sentences in MEDLINE abstracts into the IMRAD categories, few have explored the classification of sentences that appear in full-text biomedical articles. We first evaluated whether sentences in full-text biomedical articles could be reliably annotated into the IMRAD format and then explored different approaches for automatically classifying these sentences into the IMRAD categories. Our results show an overall annotation agreement of 82.14% with a Kappa score of 0.756. The best classification system is a multinomial naïve Bayes classifier trained on manually annotated data that achieved 91.95% accuracy and an average F-score of 91.55%, which is significantly higher than baseline systems. A web version of this system is available online at— http://wood.ims.uwm.edu/full_text_classifier/. Contact: hongyu@uwm.edu