Modeling annotation time to reduce workload in comparative effectiveness reviews

Authors:
Byron C. Wallace;Kevin Small;Carla E. Brodley;Joseph Lau;Thomas A. Trikalinos
Affiliations:
Tufts Medical Center, Boston, MA & Tufts University, Medford, MA, USA;Tufts University, Medford, MA, USA;Tufts University, Medford, MA, USA;Tufts Medical Center, Boston, MA, USA;Tufts Medical Center, Boston, MA, USA
Venue:
Proceedings of the 1st ACM International Health Informatics Symposium
Year:
2010

Citing 7
Cited 1

Support Vector Machine Active Learning with Application sto Text Classification

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Employing EM and Pool-Based Active Learning for Text Classification

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Proactive learning: cost-sensitive active learning with multiple imperfect oracles

Proceedings of the 17th ACM conference on Information and knowledge management
Estimating annotation cost for active learning in a multi-annotator environment

HLT '09 Proceedings of the NAACL HLT 2009 Workshop on Active Learning for Natural Language Processing
A web survey on the use of active learning to support annotation of text data

HLT '09 Proceedings of the NAACL HLT 2009 Workshop on Active Learning for Natural Language Processing
How well does active learning actually work?: Time-based evaluation of cost-reduction strategies for language documentation

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1
Active learning for biomedical citation screening

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining

Deploying an interactive machine learning system in an evidence-based practice center: abstrackr

Proceedings of the 2nd ACM SIGHIT International Health Informatics Symposium

Quantified Score

Hi-index	0.00

Visualization

Abstract

Comparative effectiveness reviews (CERs), a central methodology of comparative effectiveness research, are increasingly used to inform healthcare decisions. During these systematic reviews of the scientific literature, the reviewers (MD-methodologists) must screen several thousands of citations for eligibility according to a pre-specified protocol. While previous research has demonstrated the theoretical potential of machine learning to reduce the workload in CERs, practical obstacles to deploying such a system remain. In this article, we describe work on an end-to-end, interactive machine learning system for assisting reviewers with the tedious task of citation screening for CERs. Specifically, we present ABSTRACKR, our open-source annotation tool. In addition to allowing reviewers to designate citations as 'relevant' or 'irrelevant' to the review at hand, ABSTRACKR facilitates communicating other information useful to the classification model, such as terms that are suggestive of the relevance (or irrelevance) of a citation. The tool also records the time taken to screen citations, over which we conducted a time-series analysis to derive an annotator model. Using this model, we found that both the order in which the citations are screened and the length of each citation affect annotation time. We propose a strategy that integrates labeled terms and timing data into the Active Learning (AL) framework, in which an algorithm selects citations for the reviewer to label. We demonstrate empirically that this additional information can improve the performance of the semi-automated citation screening system.