Maximum entropy direct models for speech recognition

Authors:
Hong-Kwang Jeff Kuo;Yuqing Gao
Affiliations:
IBM Thomas J. Watson Res. Center, Yorktown Heights, NY, USA;-
Venue:
IEEE Transactions on Audio, Speech, and Language Processing
Year:
2006

Citing 0
Cited 4

The application of hidden Markov models in speech recognition

Foundations and Trends in Signal Processing
Combination of generative models and SVM based classifier for speech emotion recognition

IJCNN'09 Proceedings of the 2009 international joint conference on Neural Networks
Discriminative classifiers with adaptive kernels for noise robust speech recognition

Computer Speech and Language
Emotional speech classification using hidden conditional random fields

Proceedings of the Second Symposium on Information and Communication Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

Traditional statistical models for speech recognition have mostly been based on a Bayesian framework using generative models such as hidden Markov models (HMMs). This paper focuses on a new framework for speech recognition using maximum entropy direct modeling, where the probability of a state or word sequence given an observation sequence is computed directly from the model. In contrast to HMMs, features can be asynchronous and overlapping. This model therefore allows for the potential combination of many different types of features, which need not be statistically independent of each other. In this paper, a specific kind of direct model, the maximum entropy Markov model (MEMM), is studied. Even with conventional acoustic features, the approach already shows promising results for phone level decoding. The MEMM significantly outperforms traditional HMMs in word error rate when used as stand-alone acoustic models. Preliminary results combining the MEMM scores with HMM and language model scores show modest improvements over the best HMM speech recognizer.