Procedure for quantitatively comparing the syntactic coverage of English grammars
HLT '91 Proceedings of the workshop on Speech and Natural Language
COLT '92 Proceedings of the fifth annual workshop on Computational learning theory
A sequential algorithm for training text classifiers
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Machine Learning
Active Learning for Natural Language Parsing and Information Extraction
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Employing EM and Pool-Based Active Learning for Text Classification
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Learning probabilistic lexicalized grammars for natural language processing
Learning probabilistic lexicalized grammars for natural language processing
Building a large annotated corpus of English: the penn treebank
Computational Linguistics - Special issue on using large corpora: II
TnT: a statistical part-of-speech tagger
ANLC '00 Proceedings of the sixth conference on Applied natural language processing
Bagging and boosting a treebank parser
NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
A maximum-entropy-inspired parser
NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
Three generative, lexicalised models for statistical parsing
ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Distributional clustering of English words
ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Active learning for statistical natural language parsing
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Intricacies of Collins' Parsing Model
Computational Linguistics
Sample selection for statistical grammar induction
EMNLP '00 Proceedings of the 2000 Joint SIGDAT conference on Empirical methods in natural language processing and very large corpora: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 13
A stopping criterion for active learning
Computer Speech and Language
A web survey on the use of active learning to support annotation of text data
HLT '09 Proceedings of the NAACL HLT 2009 Workshop on Active Learning for Natural Language Processing
Sample selection for statistical parsers: cognitively driven algorithms and evaluation measures
CoNLL '09 Proceedings of the Thirteenth Conference on Computational Natural Language Learning
An intrinsic stopping criterion for committee-based active learning
CoNLL '09 Proceedings of the Thirteenth Conference on Computational Natural Language Learning
Active Zipfian sampling for statistical parser training
NAACL-Short '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers
Confidence-based stopping criteria for active learning for data annotation
ACM Transactions on Speech and Language Processing (TSLP)
Improved fully unsupervised parsing with zoomed learning
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
A comparison of models for cost-sensitive active learning
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Uncertainty-based active learning with instability estimation for text classification
ACM Transactions on Speech and Language Processing (TSLP)
Hi-index | 0.00 |
Active learning reduces the amount of manually annotated sentences necessary when training state-of-the-art statistical parsers. One popular method, uncertainty sampling, selects sentences for which the parser exhibits low certainty. However, this method does not quantify confidence about the current statistical model itself. In particular, we should be less confident about selection decisions based on low frequency events. We present a novel two-stage method which first targets sentences which cannot be reliably selected using uncertainty sampling, and then applies standard uncertainty sampling to the remaining sentences. An evaluation shows that this method performs better than pure uncertainty sampling, and better than an ensemble method based on bagged ensemble members only.