Large margin training of acoustic models for speech recognition

  • Authors:
  • Lawrence K. Saul;Fei Sha

  • Affiliations:
  • University of Pennsylvania;University of Pennsylvania

  • Venue:
  • Large margin training of acoustic models for speech recognition
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Automatic speech recognition (ASR) depends critically on building acoustic models for linguistic units. These acoustic models usually take the form of continuous-density hidden Markov models (CD-HMMs), whose parameters are obtained by maximum likelihood estimation. Recently, however, there has been growing interest in discriminative methods for parameter estimation in CD-HMMs. This thesis applies the idea of large margin training to parameter estimation in CD-HMMs. The principles of large margin training have been intensively studied, most prominently in support vector machines (SVMs). In SVMs, large margin training presents an attractive conceptual framework because it provides theoretical guarantees that balance model complexity versus generalization. It also presents an attractive computational framework because it casts many learning problems as tractable convex optimizations. This thesis extends and develops large margin methods for estimating the parameters of acoustic models for ASR. As in SVMs, the starting point is to postulate that correct and incorrect classifications are separated by a large margin; model parameters are then optimized to maximize this margin. This thesis presents algorithms for training Gaussian mixture models both as multiway classifiers in their own right and as individual components of larger models (e.g., observation models in CD-HMMs). The new techniques differ from previous discriminative methods for ASR in the goal of margin maximization. Additionally, the new techniques lead to efficient algorithms based on convex optimizations. This thesis evaluates the utility of large margin training on two benchmark problems in acoustic modeling: phonetic classification and recognition on the TIMIT speech database. In both tasks, large margin systems obtain significantly better performance than systems trained by maximum likelihood estimation or competing discriminative frameworks, such as conditional maximum likelihood and minimum classification error. This thesis also examines the utility of subgradient and extragradient methods, both of which were recently proposed for large margin training in domains other than ASR. Comparative experimental results suggest that our learning methods both scale better and yield better performance. The thesis concludes with brief discussions of future research directions, including the application of large margin training techniques to large vocabulary ASR.