COLT '92 Proceedings of the fifth annual workshop on Computational learning theory
Minimizing manual annotation cost in supervised training from corpora
ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
Rule writing or annotation: cost-efficient resource usage for base noun phrase chunking
ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Sample Selection for Statistical Parsing
Computational Linguistics
Enriching the knowledge sources used in a maximum entropy part-of-speech tagger
EMNLP '00 Proceedings of the 2000 Joint SIGDAT conference on Empirical methods in natural language processing and very large corpora: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 13
Active learning for part-of-speech tagging: accelerating corpus annotation
LAW '07 Proceedings of the Linguistic Annotation Workshop
Evaluating automation strategies in language documentation
HLT '09 Proceedings of the NAACL HLT 2009 Workshop on Active Learning for Natural Language Processing
Timed annotations: enhancing Muc7 metadata by the time it takes to annotate named entities
ACL-IJCNLP '09 Proceedings of the Third Linguistic Annotation Workshop
Using smaller constituents rather than sentences in active learning for Japanese dependency parsing
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Bucking the trend: large-scale cost-focused active learning for statistical machine translation
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Cost-Sensitive Active Visual Category Learning
International Journal of Computer Vision
Uncertainty-based active learning with instability estimation for text classification
ACM Transactions on Speech and Language Processing (TSLP)
Hi-index | 0.00 |
Traditional Active Learning (AL) techniques assume that the annotation of each datum costs the same. This is not the case when annotating sequences; some sequences will take longer than others. We show that the AL technique which performs best depends on how cost is measured. Applying an hourly cost model based on the results of an annotation user study, we approximate the amount of time necessary to annotate a given sentence. This model allows us to evaluate the effectiveness of AL sampling methods in terms of time spent in annotation. We acheive a 77% reduction in hours from a random baseline to achieve 96.5% tag accuracy on the Penn Treebank. More significantly, we make the case for measuring cost in assessing AL methods.