Active learning and logarithmic opinion pools for hpsg parse selection

Authors:
Jason Baldridge;Miles Osborne
Affiliations:
Department of linguistics, university of texas at austin, austin, tx 78712, usa e-mail: jbaldrid@mail.utexas.edu;School of informatics, university of edinburgh, edinburgh eh8 9lw, uk e-mail: miles@inf.ed.ac.uk
Venue:
Natural Language Engineering
Year:
2008

Citing 35
Cited 9

Query by committee

COLT '92 Proceedings of the fifth annual workshop on Computational learning theory
Neural networks and the bias/variance dilemma

Neural Computation
A sequential algorithm for training text classifiers

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
On the exponential value of labeled samples

Pattern Recognition Letters
Selective Sampling Using the Query by Committee Algorithm

Machine Learning
Selecting weighting factors in logarithmic opinion pools

NIPS '97 Proceedings of the 1997 conference on Advances in neural information processing systems 10
Toward Optimal Active Learning through Sampling Estimation of Error Reduction

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Query Learning Strategies Using Boosting and Bagging

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Active Learning for Natural Language Parsing and Information Extraction

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Discriminative Reranking for Natural Language Parsing

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Support Vector Machine Active Learning with Application sto Text Classification

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Employing EM and Pool-Based Active Learning for Text Classification

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
SVMTorch: support vector machines for large-scale regression problems

The Journal of Machine Learning Research
Shallow parsing using noisy and non-stationary training material

The Journal of Machine Learning Research
Active Sampling for Class Probability Estimation and Ranking

Machine Learning
On building a more efficient grammar by exploiting types

Natural Language Engineering
Distributional clustering of English words

ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Estimation of stochastic attribute-value grammars using an informative sample

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Diverse ensembles for active learning

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Estimators for stochastic "Unification-Based" grammars

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
The LinGO Redwoods treebank motivation and preliminary applications

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 2
An algebra for semantic construction in constraint-based grammars

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Active learning for statistical natural language parsing

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Dynamic programming for parsing and estimation of stochastic unification-based grammars

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Rule writing or annotation: cost-efficient resource usage for base noun phrase chunking

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Lexicalized stochastic modeling of constraint-based grammars using log-linear measures and EM training

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Sample selection for statistical grammar induction

EMNLP '00 Proceedings of the 2000 Joint SIGDAT conference on Empirical methods in natural language processing and very large corpora: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 13
The grammar matrix: an open-source starter-kit for the rapid development of cross-linguistically consistent broad-coverage precision grammars

COLING-GEE '02 Proceedings of the 2002 workshop on Grammar engineering and evaluation - Volume 15
A comparison of algorithms for maximum entropy parameter estimation

COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20
Feature selection for a rich HPSG grammar using decision trees

COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20
Active learning for HPSG parse selection

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Logarithmic opinion pools for conditional random fields

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Coarse-to-fine n-best parsing and MaxEnt discriminative reranking

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
High precision treebanking: blazing useful trees using POS information

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
The hinoki treebank a treebank for text understanding

IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing

Evaluating automation strategies in language documentation

HLT '09 Proceedings of the NAACL HLT 2009 Workshop on Active Learning for Natural Language Processing
A method for stopping active learning based on stabilizing predictions and the need for user-adjustable stopping

CoNLL '09 Proceedings of the Thirteenth Conference on Computational Natural Language Learning
Exploiting semantic information for HPSG parse selection

DeepLP '07 Proceedings of the Workshop on Deep Linguistic Processing
Active Zipfian sampling for statistical parser training

NAACL-Short '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers
How well does active learning actually work?: Time-based evaluation of cost-reduction strategies for language documentation

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1
Products of random latent variable grammars

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Self-training with products of latent variable grammars

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Cost-Sensitive Active Visual Category Learning

International Journal of Computer Vision
Exploiting Semantic Information for HPSG Parse Selection

Research on Language and Computation

Quantified Score

Hi-index	0.00

Visualization

Abstract

For complex tasks such as parse selection, the creation of labelled training sets can be extremely costly. Resource-efficient schemes for creating informative labelled material must therefore be considered. We investigate the relationship between two broad strategies for reducing the amount of manual labelling necessary to train accurate parse selection models: ensemble models and active learning. We show that popular active learning methods for reducing annotation costs can be outperformed by instead using a model class which uses the available labelled data more efficiently. For this, we use a simple type of ensemble model called the Logarithmic Opinion Pool (LOP). We furthermore show that LOPs themselves can benefit from active learning. As predicted by a theoretical explanation of the predictive power of LOPs, a detailed analysis of active learning using LOPs shows that component model diversity is a strong predictor of successful LOP performance. Other contributions include a novel active learning method, a justification of our simulation studies using timing information, and cross-domain verification of our main ideas using text classification.