The application of hidden Markov models in speech recognition

Authors:
Mark Gales;Steve Young
Affiliations:
Cambridge University Engineering Department, Cambridge, UK;Cambridge University Engineering Department, Cambridge, UK
Venue:
Foundations and Trends in Signal Processing
Year:
2007

Citing 44
Cited 22

Maximum likelihood estimation for multivariate mixture observations of Markov chins

IEEE Transactions on Information Theory
Continuously variable duration hidden Markov models for automatic speech recognition

Computer Speech and Language
Introduction to statistical pattern recognition (2nd ed.)

Introduction to statistical pattern recognition (2nd ed.)
Experiments with a Nonlinear Spectral Subtractor (NSS), Hidden Markov Models and the projection, for robust speech recognition in cars

Speech Communication - Eurospeech '91
Class-based n-gram models of natural language

Computational Linguistics
Cepstral parameter compensation for HMM recognition in noise

Speech Communication - Special issue on speech processing in adverse conditions
A decision-theoretic generalization of on-line learning and an application to boosting

Journal of Computer and System Sciences - Special issue: 26th annual ACM symposium on the theory of computing & STOC'94, May 23–25, 1994, and second annual Europe an conference on computational learning theory (EuroCOLT'95), March 13–15, 1995
MMIE training of large vocabulary recognition systems

Speech Communication
Heteroscedastic discriminant analysis and reduced rank HMMs for improved speech recognition

Speech Communication
Acoustical and Environmental Robustness in Automatic Speech Recognition

Acoustical and Environmental Robustness in Automatic Speech Recognition
Missing Data Techniques for Robust Speech Recognition

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 2 - Volume 2
The harpy speech recognition system.

The harpy speech recognition system.
The acoustic-modeling problem in automatic speech recognition

The acoustic-modeling problem in automatic speech recognition
Speech recognition in noisy environments

Speech recognition in noisy environments
Speech recognition with dynamic bayesian networks

Speech recognition with dynamic bayesian networks
Text to Speech Synthesis: New Paradigms and Advances

Text to Speech Synthesis: New Paradigms and Advances
Using observation uncertainty for robust speech recognition

Using observation uncertainty for robust speech recognition
Tree-based state tying for high accuracy acoustic modelling

HLT '94 Proceedings of the workshop on Human Language Technology
A one pass decoder design for large vocabulary recognition

HLT '94 Proceedings of the workshop on Human Language Technology
Learning structured prediction models: a large margin approach

Learning structured prediction models: a large margin approach
Algorithms for an optimal A* search and linearizing the search in the stack decoder

ICASSP '91 Proceedings of the Acoustics, Speech, and Signal Processing, 1991. ICASSP-91., 1991 International Conference
A comparison of several approximate algorithms for finding multiple (N-best) sentence hypotheses

ICASSP '91 Proceedings of the Acoustics, Speech, and Signal Processing, 1991. ICASSP-91., 1991 International Conference
An improved MMIE training algorithm for speaker-independent, small vocabulary, continuous speech recognition

ICASSP '91 Proceedings of the Acoustics, Speech, and Signal Processing, 1991. ICASSP-91., 1991 International Conference
Minimum Bayes Risk Estimation and Decoding in Large Vocabulary Continuous Speech Recognition

IEICE - Transactions on Information and Systems
Discriminative n-gram language modeling

Computer Speech and Language
Speaker normalization using efficient frequency warping procedures

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 01
A vector Taylor series approach for environment-independent speech recognition

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 02
Probabilistic classification of HMM states for large vocabulary continuous speech recognition

ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 01
The 1998 HTK system for transcription of conversational telephone speech

ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 01
Rapid speech recognizer adaptation to new speakers

ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 02
Using boosting to improve a hybrid HMM/neural network speech recognizer

ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 02
Maximum likelihood discriminant feature spaces

ICASSP '00 Proceedings of the Acoustics, Speech, and Signal Processing, 2000. on IEEE International Conference - Volume 02
Discriminative semi-parametric trajectory model for speech recognition

Computer Speech and Language
Transforming Binary Uncertainties for Robust Speech Recognition

IEEE Transactions on Audio, Speech, and Language Processing
Kernel Eigenspace-Based MLLR Adaptation

IEEE Transactions on Audio, Speech, and Language Processing
Discriminative Training for Large-Vocabulary Speech Recognition Using Minimum Classification Error

IEEE Transactions on Audio, Speech, and Language Processing
Maximum entropy direct models for speech recognition

IEEE Transactions on Audio, Speech, and Language Processing
Bayesian Adaptive Inference and Adaptive Training

IEEE Transactions on Audio, Speech, and Language Processing
Automatic Model Complexity Control Using Marginalized Discriminative Growth Functions

IEEE Transactions on Audio, Speech, and Language Processing
Progress in the CU-HTK broadcast news transcription system

IEEE Transactions on Audio, Speech, and Language Processing
Advances in transcription of broadcast news and conversational telephone speech within the combined EARS BBN/LIMSI system

IEEE Transactions on Audio, Speech, and Language Processing
An overview of automatic speaker diarization systems

IEEE Transactions on Audio, Speech, and Language Processing
Large margin hidden Markov models for speech recognition

IEEE Transactions on Audio, Speech, and Language Processing
Discriminative cluster adaptive training

IEEE Transactions on Audio, Speech, and Language Processing

Invited paper: Automatic speech recognition: History, methods and challenges

Pattern Recognition
Chinese Pinyin-Text Conversion on Segmented Text

TSD '09 Proceedings of the 12th International Conference on Text, Speech and Dialogue
Robust speech recognition by integrating speech separation and hypothesis testing

Speech Communication
Thousands of voices for HMM-based speech synthesis: analysis and application of TTS systems built on various ASR corpora

IEEE Transactions on Audio, Speech, and Language Processing
Recognition and understanding of meetings

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Continuous Malayalam speech recognition using Hidden Markov Models

Proceedings of the 1st Amrita ACM-W Celebration on Women in Computing in India
Ageing voices: the effect of changes in voice parameters on ASR performance

EURASIP Journal on Audio, Speech, and Music Processing - Special issue on atypical speech
Reverberation model-based decoding in the logmelspec domain for robust distant-talking speech recognition

IEEE Transactions on Audio, Speech, and Language Processing - Special issue on processing reverberant speech: methodologies and applications
The subspace Gaussian mixture model-A structured model for speech recognition

Computer Speech and Language
Histogram equalization to model adaptation for robust speech recognition

EURASIP Journal on Advances in Signal Processing
Theory and Use of the EM Algorithm

Foundations and Trends in Signal Processing
Crypt analysis of two time pads in case of compressed speech

Computers and Electrical Engineering
Audiovisual assistance for the elderly - an overview of the FEARLESS project

ICOST'11 Proceedings of the 9th international conference on Toward useful services for elderly and people with disabilities: smart homes and health telematics
Acoustic modeling problem for automatic speech recognition system: conventional methods (Part I)

International Journal of Speech Technology
Acoustic modeling problem for automatic speech recognition system: advances and refinements (Part II)

International Journal of Speech Technology
Speaker verification from partially encrypted compressed speech for forensic investigation

Digital Investigation: The International Journal of Digital Forensics & Incident Response
Review: The automatic creation of concept maps from documents written using morphologically rich languages

Expert Systems with Applications: An International Journal
Continuous Speech Recognition system for Tamil language using monophone-based Hidden Markov Model

Proceedings of the Second International Conference on Computational Science, Engineering and Information Technology
Personalising speech-to-speech translation: Unsupervised cross-lingual speaker adaptation for HMM-based speech synthesis

Computer Speech and Language
Hidden Source Behavior Change Tracking and Detection

WI-IAT '12 Proceedings of the The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Volume 02
Spoken language processing: where do we go from here?

Your Virtual Butler
Introducing the use of depth data for fall detection

Personal and Ubiquitous Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Hidden Markov Models (HMMs) provide a simple and effective framework for modelling time-varying spectral vector sequences. As a consequence, almost all present day large vocabulary continuous speech recognition (LVCSR) systems are based on HMMs. Whereas the basic principles underlying HMM-based LVCSR are rather straightforward, the approximations and simplifying assumptions involved in a direct implementation of these principles would result in a system which has poor accuracy and unacceptable sensitivity to changes in operating environment. Thus, the practical application of HMMs in modern systems involves considerable sophistication. The aim of this review is first to present the core architecture of a HMM-based LVCSR system and then describe the various refinements which are needed to achieve state-of-the-art performance. These refinements include feature projection, improved covariance modelling, discriminative parameter estimation, adaptation and normalisation, noise compensation and multi-pass system combination. The review concludes with a case study of LVCSR for Broadcast News and Conversation transcription in order to illustrate the techniques described.