Active learning and logarithmic opinion pools for hpsg parse selection

  • Authors:
  • Jason Baldridge;Miles Osborne

  • Affiliations:
  • Department of linguistics, university of texas at austin, austin, tx 78712, usa e-mail: jbaldrid@mail.utexas.edu;School of informatics, university of edinburgh, edinburgh eh8 9lw, uk e-mail: miles@inf.ed.ac.uk

  • Venue:
  • Natural Language Engineering
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

For complex tasks such as parse selection, the creation of labelled training sets can be extremely costly. Resource-efficient schemes for creating informative labelled material must therefore be considered. We investigate the relationship between two broad strategies for reducing the amount of manual labelling necessary to train accurate parse selection models: ensemble models and active learning. We show that popular active learning methods for reducing annotation costs can be outperformed by instead using a model class which uses the available labelled data more efficiently. For this, we use a simple type of ensemble model called the Logarithmic Opinion Pool (LOP). We furthermore show that LOPs themselves can benefit from active learning. As predicted by a theoretical explanation of the predictive power of LOPs, a detailed analysis of active learning using LOPs shows that component model diversity is a strong predictor of successful LOP performance. Other contributions include a novel active learning method, a justification of our simulation studies using timing information, and cross-domain verification of our main ideas using text classification.