Discriminative n-gram language modeling

  • Authors:
  • Brian Roark;Murat Saraclar;Michael Collins

  • Affiliations:
  • Center for Spoken Language Understanding, OGI School of Science and Engineering at Oregon Health and Science University, 20000 NW Walker Road, Beaverton, OR 97006, United States;Boğaziçi University, 34342 Bebek, Istanbul, Turkey;MIT CSAIL/EECS Stata Center, Building 32-G484, Cambridge, MA 02139, United States

  • Venue:
  • Computer Speech and Language
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper describes discriminative language modeling for a large vocabulary speech recognition task. We contrast two parameter estimation methods: the perceptron algorithm, and a method based on maximizing the regularized conditional log-likelihood. The models are encoded as deterministic weighted finite state automata, and are applied by intersecting the automata with word-lattices that are the output from a baseline recognizer. The perceptron algorithm has the benefit of automatically selecting a relatively small feature set in just a couple of passes over the training data. We describe a method based on regularized likelihood that makes use of the feature set given by the perceptron algorithm, and initialization with the perceptron's weights; this method gives an additional 0.5% reduction in word error rate (WER) over training with the perceptron alone. The final system achieves a 1.8% absolute reduction in WER for a baseline first-pass recognition system (from 39.2% to 37.4%), and a 0.9% absolute reduction in WER for a multi-pass recognition system (from 28.9% to 28.0%).