Speech recognition using augmented conditional random fields

Authors:
Yasser Hifny;Steve Renals
Affiliations:
IBM T. J. Watson Research Center, Yorktown Heights, NY;Centre for Speech Technology Research, University of Edinburgh, Edinburgh, UK
Venue:
IEEE Transactions on Audio, Speech, and Language Processing
Year:
2009

Citing 23
Cited 2

Optimal brain damage

Advances in neural information processing systems 2
A training algorithm for optimal margin classifiers

COLT '92 Proceedings of the fifth annual workshop on Computational learning theory
Hidden Markov models, maximum mutual information estimation, and the speech recognition problem

Hidden Markov models, maximum mutual information estimation, and the speech recognition problem
Inducing Features of Random Fields

IEEE Transactions on Pattern Analysis and Machine Intelligence
Statistical methods for speech recognition

Statistical methods for speech recognition
Heteroscedastic discriminant analysis and reduced rank HMMs for improved speech recognition

Speech Communication
Exploiting generative models in discriminative classifiers

Proceedings of the 1998 conference on Advances in neural information processing systems II
Comparison of discriminative training criteria and optimization methods for speech recognition

Speech Communication
Neural Networks: A Comprehensive Foundation

Neural Networks: A Comprehensive Foundation
Neural Networks for Pattern Recognition

Neural Networks for Pattern Recognition
Spoken Language Processing: A Guide to Theory, Algorithm, and System Development

Spoken Language Processing: A Guide to Theory, Algorithm, and System Development
Connectionist Speech Recognition: A Hybrid Approach

Connectionist Speech Recognition: A Hybrid Approach
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Effiicient BackProp

Neural Networks: Tricks of the Trade, this book is an outgrowth of a 1996 NIPS workshop
The acoustic-modeling problem in automatic speech recognition

The acoustic-modeling problem in automatic speech recognition
Large-vocabulary speaker-independent continuous speech recognition: the sphinx system

Large-vocabulary speaker-independent continuous speech recognition: the sphinx system
Grafting: fast, incremental feature selection by gradient descent in function space

The Journal of Machine Learning Research
A comparison of algorithms for maximum entropy parameter estimation

COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20
Accelerated training of conditional random fields with stochastic gradient methods

ICML '06 Proceedings of the 23rd international conference on Machine learning
Decision trees for phonological rules in continuous speech

ICASSP '91 Proceedings of the Acoustics, Speech, and Signal Processing, 1991. ICASSP-91., 1991 International Conference
What HMMs Can Do

IEICE - Transactions on Information and Systems
Buried Markov models for speech recognition

ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 02
Maximum likelihood discriminant feature spaces

ICASSP '00 Proceedings of the Acoustics, Speech, and Signal Processing, 2000. on IEEE International Conference - Volume 02

Investigations into the Crandem approach to word recognition

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Pretreatment for speech machine translation

ICCCI'10 Proceedings of the Second international conference on Computational collective intelligence: technologies and applications - Volume Part II

Quantified Score

Hi-index	0.00

Visualization

Abstract

Acoustic modeling based on hidden Markov models (HMMs) is employed by state-of-the-art stochastic speech recognition systems. Although HMMs are a natural choice to warp the time axis and model the temporal phenomena in the speech signal, their conditional independence properties limit their ability to model spectral phenomena well. In this paper, a new acoustic modeling paradigm based on augmented conditional random fields (ACRFs) is investigated and developed. This paradigm addresses some limitations of HMMs while maintaining many of the aspects which have made them successful. In particular, the acoustic modeling problem is reformulated in a data driven, sparse, augmented space to increase discrimination. Acoustic context modeling is explicitly integrated to handle the sequential phenomena of the speech signal. We present an efficient framework for estimating these models that ensures scalability and generality. In the TIMIT phone recognition task, a phone error rate of 23.0% was recorded on the full test set, a significant improvement over comparable HMM-based systems.