A two-stage method for active learning of statistical grammars

Authors:
Markus Becker;Miles Osborne
Affiliations:
School of Informatics, University of Edinburgh;School of Informatics, University of Edinburgh
Venue:
IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Year:
2005

Citing 16
Cited 9

Procedure for quantitatively comparing the syntactic coverage of English grammars

HLT '91 Proceedings of the workshop on Speech and Natural Language
Query by committee

COLT '92 Proceedings of the fifth annual workshop on Computational learning theory
A sequential algorithm for training text classifiers

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Bagging predictors

Machine Learning
Active Learning for Natural Language Parsing and Information Extraction

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Employing EM and Pool-Based Active Learning for Text Classification

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Learning probabilistic lexicalized grammars for natural language processing

Learning probabilistic lexicalized grammars for natural language processing
Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
TnT: a statistical part-of-speech tagger

ANLC '00 Proceedings of the sixth conference on Applied natural language processing
Bagging and boosting a treebank parser

NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
A maximum-entropy-inspired parser

NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
Three generative, lexicalised models for statistical parsing

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Distributional clustering of English words

ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Active learning for statistical natural language parsing

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Intricacies of Collins' Parsing Model

Computational Linguistics
Sample selection for statistical grammar induction

EMNLP '00 Proceedings of the 2000 Joint SIGDAT conference on Empirical methods in natural language processing and very large corpora: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 13

A stopping criterion for active learning

Computer Speech and Language
A web survey on the use of active learning to support annotation of text data

HLT '09 Proceedings of the NAACL HLT 2009 Workshop on Active Learning for Natural Language Processing
Sample selection for statistical parsers: cognitively driven algorithms and evaluation measures

CoNLL '09 Proceedings of the Thirteenth Conference on Computational Natural Language Learning
An intrinsic stopping criterion for committee-based active learning

CoNLL '09 Proceedings of the Thirteenth Conference on Computational Natural Language Learning
Active Zipfian sampling for statistical parser training

NAACL-Short '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers
Confidence-based stopping criteria for active learning for data annotation

ACM Transactions on Speech and Language Processing (TSLP)
Improved fully unsupervised parsing with zoomed learning

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
A comparison of models for cost-sensitive active learning

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Uncertainty-based active learning with instability estimation for text classification

ACM Transactions on Speech and Language Processing (TSLP)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Active learning reduces the amount of manually annotated sentences necessary when training state-of-the-art statistical parsers. One popular method, uncertainty sampling, selects sentences for which the parser exhibits low certainty. However, this method does not quantify confidence about the current statistical model itself. In particular, we should be less confident about selection decisions based on low frequency events. We present a novel two-stage method which first targets sentences which cannot be reliably selected using uncertainty sampling, and then applies standard uncertainty sampling to the remaining sentences. An evaluation shows that this method performs better than pure uncertainty sampling, and better than an ensemble method based on bagged ensemble members only.