Applying active learning to assertion classification of concepts in clinical text

Authors:
Yukun Chen;Subramani Mani;Hua Xu
Affiliations:
Department of Biomedical Informatics, Vanderbilt University, School of Medicine, Nashville, TN, USA;Department of Biomedical Informatics, Vanderbilt University, School of Medicine, Nashville, TN, USA;Department of Biomedical Informatics, Vanderbilt University, School of Medicine, Nashville, TN, USA
Venue:
Journal of Biomedical Informatics
Year:
2012

Citing 12
Cited 1

Multiple comparison procedures

Multiple comparison procedures
Query by committee

COLT '92 Proceedings of the fifth annual workshop on Computational learning theory
A sequential algorithm for training text classifiers

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Support vector machine active learning for image retrieval

MULTIMEDIA '01 Proceedings of the ninth ACM international conference on Multimedia
Toward Optimal Active Learning through Sampling Estimation of Error Reduction

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Employing EM and Pool-Based Active Learning for Text Classification

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Support vector machine active learning with applications to text classification

The Journal of Machine Learning Research
Active learning for statistical natural language parsing

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
An empirical study of the behavior of active learning for word sense disambiguation

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
LIBLINEAR: A Library for Large Linear Classification

The Journal of Machine Learning Research
An analysis of active learning strategies for sequence labeling tasks

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
MMR-based active machine learning for bio named entity recognition

NAACL-Short '06 Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers

Semi-supervised clinical text classification with Laplacian SVMs: An application to cancer case management

Journal of Biomedical Informatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Supervised machine learning methods for clinical natural language processing (NLP) research require a large number of annotated samples, which are very expensive to build because of the involvement of physicians. Active learning, an approach that actively samples from a large pool, provides an alternative solution. Its major goal in classification is to reduce the annotation effort while maintaining the quality of the predictive model. However, few studies have investigated its uses in clinical NLP. This paper reports an application of active learning to a clinical text classification task: to determine the assertion status of clinical concepts. The annotated corpus for the assertion classification task in the 2010 i2b2/VA Clinical NLP Challenge was used in this study. We implemented several existing and newly developed active learning algorithms and assessed their uses. The outcome is reported in the global ALC score, based on the Area under the average Learning Curve of the AUC (Area Under the Curve) score. Results showed that when the same number of annotated samples was used, active learning strategies could generate better classification models (best ALC - 0.7715) than the passive learning method (random sampling) (ALC - 0.7411). Moreover, to achieve the same classification performance, active learning strategies required fewer samples than the random sampling method. For example, to achieve an AUC of 0.79, the random sampling method used 32 samples, while our best active learning algorithm required only 12 samples, a reduction of 62.5% in manual annotation effort.