COLT '92 Proceedings of the fifth annual workshop on Computational learning theory
Information-based objective functions for active data selection
Neural Computation
A sequential algorithm for training text classifiers
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Improving Generalization with Active Learning
Machine Learning - Special issue on structured connectionist systems
Information, Prediction, and Query by Committee
Advances in Neural Information Processing Systems 5, [NIPS Conference]
Structural ambiguity and lexical relations
Computational Linguistics - Special issue on using large corpora: I
Tagging English text with a probabilistic model
Computational Linguistics
A stochastic parts program and noun phrase parser for unrestricted text
ANLC '88 Proceedings of the second conference on Applied natural language processing
Does Baum-Welch re-estimation help taggers?
ANLC '94 Proceedings of the fourth conference on Applied natural language processing
ANLC '94 Proceedings of the fourth conference on Applied natural language processing
Towards history-based grammars: using richer models for probabilistic parsing
ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Automatic construction of semantic lexicons for learning natural language interfaces
AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
Introduction to the special issue on word sense disambiguation: the state of the art
Computational Linguistics - Special issue on word sense disambiguation
Selective sampling for example-based word sense disambiguation
Computational Linguistics
Example selection for bootstrapping statistical parsers
NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Rule writing or annotation: cost-efficient resource usage for base noun phrase chunking
ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Sample Selection for Statistical Parsing
Computational Linguistics
Coaxing confidences from an old friend: probabilistic classifications from transformation rule lists
EMNLP '00 Proceedings of the 2000 Joint SIGDAT conference on Empirical methods in natural language processing and very large corpora: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 13
Sample selection for statistical grammar induction
EMNLP '00 Proceedings of the 2000 Joint SIGDAT conference on Empirical methods in natural language processing and very large corpora: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 13
On minimizing training corpus for parser acquisition
ConLL '01 Proceedings of the 2001 workshop on Computational Natural Language Learning - Volume 7
Active learning for HPSG parse selection
CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
A backoff model for bootstrapping resources for non-English languages
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Analysis of selective strategies to build a dependency-analyzed corpus
COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Assessing the costs of sampling methods in active learning for annotation
HLT-Short '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers
Accelerating the annotation of sparse named entities by dynamic sentence selection
BioNLP '08 Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing
An intrinsic stopping criterion for committee-based active learning
CoNLL '09 Proceedings of the Thirteenth Conference on Computational Natural Language Learning
Reducing class imbalance during active learning for named entity annotation
Proceedings of the fifth international conference on Knowledge capture
The ups and downs of preposition error detection in ESL writing
COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Active learning for the identification of nonliteral language
FigLanguages '07 Proceedings of the Workshop on Computational Approaches to Figurative Language
Native judgments of non-native usage: experiments in preposition error detection
HumanJudge '08 Proceedings of the Workshop on Human Judgements in Computational Linguistics
Efficient annotation with the Jena ANnotation Environment (JANE)
LAW '07 Proceedings of the Linguistic Annotation Workshop
Active learning for part-of-speech tagging: accelerating corpus annotation
LAW '07 Proceedings of the Linguistic Annotation Workshop
On privacy preservation in text and document-based active learning for named entity recognition
Proceedings of the ACM first international workshop on Privacy and anonymity for very large databases
Semi-supervised active learning for sequence labeling
ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
Centrality Measures from Complex Networks in Active Learning
DS '09 Proceedings of the 12th International Conference on Discovery Science
Parallel active learning: eliminating wait time with minimal staleness
ALNLP '10 Proceedings of the NAACL HLT 2010 Workshop on Active Learning for Natural Language Processing
A comparison of models for cost-sensitive active learning
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Authoring technical documents for effective retrieval
EKAW'10 Proceedings of the 17th international conference on Knowledge engineering and management by the masses
Hi-index | 0.00 |
Corpus-based methods for natural language processing often use supervised training, requiring expensive manual annotation of training corpora. This paper investigates methods for reducing annotation cost by sample selection. In this approach, during training the learning program examines many unlabeled examples and selects for labeling (annotation) only those that are most informative at each stage. This avoids redundantly annotating examples that contribute little new information. This paper extends our previous work on committee-based sample selection for probabilistic classifiers. We describe a family of methods for committee-based sample selection, and report experimental results for the task of stochastic part-of-speech tagging. We find that all variants achieve a significant reduction in annotation cost, though their computational efficiency differs. In particular, the simplest method, which has no parameters to tune, gives excellent results. We also show that sample selection yields a significant reduction in the size of the model used by the tagger.