Reducing weight undertraining in structured discriminative learning

  • Authors:
  • Charles Sutton;Michael Sindelar;Andrew McCallum

  • Affiliations:
  • University of Massachusetts Amherst, Amherst, MA;University of Massachusetts Amherst, Amherst, MA;University of Massachusetts Amherst, Amherst, MA

  • Venue:
  • HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Discriminative probabilistic models are very popular in NLP because of the latitude they afford in designing features. But training involves complex trade-offs among weights, which can be dangerous: a few highly-indicative features can swamp the contribution of many individually weaker features, causing their weights to be undertrained. Such a model is less robust, for the highly-indicative features may be noisy or missing in the test data. To ameliorate this weight undertraining, we introduce several new feature bagging methods, in which separate models are trained on subsets of the original features, and combined using a mixture model or a product of experts. These methods include the logarithmic opinion pools used by Smith et al. (2005). We evaluate feature bagging on linear-chain conditional random fields for two natural-language tasks. On both tasks, the feature-bagged CRF performs better than simply training a single CRF on all the features.