How well does active learning actually work?: Time-based evaluation of cost-reduction strategies for language documentation

Authors:
Jason Baldridge;Alexis Palmer
Affiliations:
The University of Texas at Austin;Saarland University
Venue:
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1
Year:
2009

Citing 13
Cited 7

Maximum entropy models for natural language ambiguity resolution

Maximum entropy models for natural language ambiguity resolution
Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
Diverse ensembles for active learning

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Investigating GIS and smoothing for maximum entropy taggers

EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
Rule writing or annotation: cost-efficient resource usage for base noun phrase chunking

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Active learning and logarithmic opinion pools for hpsg parse selection

Natural Language Engineering
Proactive learning: cost-sensitive active learning with multiple imperfect oracles

Proceedings of the 17th ACM conference on Information and knowledge management
Estimating annotation cost for active learning in a multi-annotator environment

HLT '09 Proceedings of the NAACL HLT 2009 Workshop on Active Learning for Natural Language Processing
Evaluating automation strategies in language documentation

HLT '09 Proceedings of the NAACL HLT 2009 Workshop on Active Learning for Natural Language Processing
A web survey on the use of active learning to support annotation of text data

HLT '09 Proceedings of the NAACL HLT 2009 Workshop on Active Learning for Natural Language Processing
CoNLL-X shared task on multilingual dependency parsing

CoNLL-X '06 Proceedings of the Tenth Conference on Computational Natural Language Learning
Cheap and fast---but is it good?: evaluating non-expert annotations for natural language tasks

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Investigating the effects of selective sampling on the annotation task

CONLL '05 Proceedings of the Ninth Conference on Computational Natural Language Learning

Bringing active learning to life

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Modeling annotation time to reduce workload in comparative effectiveness reviews

Proceedings of the 1st ACM International Health Informatics Symposium
Word clouds for efficient document labeling

DS'11 Proceedings of the 14th international conference on Discovery science
Uncertainty-based active learning with instability estimation for text classification

ACM Transactions on Speech and Language Processing (TSLP)
Deploying an interactive machine learning system in an evidence-based practice center: abstrackr

Proceedings of the 2nd ACM SIGHIT International Health Informatics Symposium
Active learning with Amazon Mechanical Turk

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Activized learning: transforming passive to active with improved label complexity

The Journal of Machine Learning Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

Machine involvement has the potential to speed up language documentation. We assess this potential with timed annotation experiments that consider annotator expertise, example selection methods, and suggestions from a machine classifier. We find that better example selection and label suggestions improve efficiency, but effectiveness depends strongly on annotator expertise. Our expert performed best with uncertainty selection, but gained little from suggestions. Our non-expert performed best with random selection and suggestions. The results underscore the importance both of measuring annotation cost reductions with respect to time and of the need for cost-sensitive learning methods that adapt to annotators.