The subspace Gaussian mixture model-A structured model for speech recognition

Authors:
Daniel Povey;Lukáš Burget;Mohit Agarwal;Pinar Akyazi;Feng Kai;Arnab Ghoshal;Ondřej Glembek;Nagendra Goel;Martin Karafiát;Ariya Rastrow;Richard C. Rose;Petr Schwarz;Samuel Thomas
Affiliations:
Microsoft Research, Redmond, WA, USA;Brno University of Technology, Czech Republic;IIIT Allahabad, India;Bogaziçi University, Istanbul, Turkey;Hong Kong University of Science and Technology, Hong Kong, China;Saarland University, Saarbrücken, Germany;Brno University of Technology, Czech Republic;Go-Vivace Inc., Virginia, USA;Brno University of Technology, Czech Republic;Johns Hopkins University, Baltimore, MD, USA;McGill University, Montreal, Canada;Brno University of Technology, Czech Republic;Johns Hopkins University, Baltimore, MD, USA
Venue:
Computer Speech and Language
Year:
2011

Citing 8
Cited 3

Matrix computations (3rd ed.)

Matrix computations (3rd ed.)
Natural gradient works efficiently in learning

Neural Computation
Heteroscedastic discriminant analysis and reduced rank HMMs for improved speech recognition

Speech Communication
Language-independent and language-adaptive acoustic modeling for speech recognition

Speech Communication
Tree-based state tying for high accuracy acoustic modelling

HLT '94 Proceedings of the workshop on Human Language Technology
The application of hidden Markov models in speech recognition

Foundations and Trends in Signal Processing
SWITCHBOARD: telephone speech corpus for research and development

ICASSP'92 Proceedings of the 1992 IEEE international conference on Acoustics, speech and signal processing - Volume 1
A Study of Interspeaker Variability in Speaker Verification

IEEE Transactions on Audio, Speech, and Language Processing

Minimum Bayes Risk decoding and system combination based on a recursion for edit distance

Computer Speech and Language
Acoustic modelling for speech recognition in Indian languages in an agricultural commodities task domain

Speech Communication
Cross-Lingual Subspace Gaussian Mixture Models for Low-Resource Speech Recognition

IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP)

Quantified Score

Hi-index	0.00

Visualization

Abstract

We describe a new approach to speech recognition, in which all Hidden Markov Model (HMM) states share the same Gaussian Mixture Model (GMM) structure with the same number of Gaussians in each state. The model is defined by vectors associated with each state with a dimension of, say, 50, together with a global mapping from this vector space to the space of parameters of the GMM. This model appears to give better results than a conventional model, and the extra structure offers many new opportunities for modeling innovations while maintaining compatibility with most standard techniques.