Trends and advances in speech recognition

Authors:
M. Picheny;D. Nahamoo;V. Goel;B. Kingsbury;B. Ramabhadran;S. J. Rennie;G. Saon
Affiliations:
IBM Research Division, Thomas J. Watson Research Center, Yorktown Heights, NY;IBM Research Division, Thomas J. Watson Research Center, Yorktown Heights, NY;IBM Research Division, Thomas J. Watson Research Center, Yorktown Heights, NY;IBM Research Division, Thomas J. Watson Research Center, Yorktown Heights, NY;IBM Research Division, Thomas J. Watson Research Center, Yorktown Heights, NY;IBM Research Division, Thomas J. Watson Research Center, Yorktown Heights, NY;IBM Research Division, Thomas J. Watson Research Center, Yorktown Heights, NY
Venue:
IBM Journal of Research and Development
Year:
2011

Citing 25
Cited 0

A statistical approach to machine translation

Computational Linguistics
Elements of information theory

Elements of information theory
A maximum entropy approach to natural language processing

Computational Linguistics
Speech recognition by machines and humans

Speech Communication
MMIE training of large vocabulary recognition systems

Speech Communication
An Introduction to Variational Methods for Graphical Models

Machine Learning
Reduction Techniques for Instance-BasedLearning Algorithms

Machine Learning
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Remap: recursive estimation and maximization of a posteriori probabilities in transition-based speech recognition

Remap: recursive estimation and maximization of a posteriori probabilities in transition-based speech recognition
Non-negative Matrix Factorization with Sparseness Constraints

The Journal of Machine Learning Research
A fast learning algorithm for deep belief nets

Neural Computation
An Alphanet approach to optimising input transformations for continuous speech recognition

ICASSP '91 Proceedings of the Acoustics, Speech, and Signal Processing, 1991. ICASSP-91., 1991 International Conference
A vector Taylor series approach for environment-independent speech recognition

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 02
Maximum likelihood discriminant feature spaces

ICASSP '00 Proceedings of the Acoustics, Speech, and Signal Processing, 2000. on IEEE International Conference - Volume 02
A unified architecture for natural language processing: deep neural networks with multitask learning

Proceedings of the 25th international conference on Machine learning
Extracting and composing robust features with denoising autoencoders

Proceedings of the 25th international conference on Machine learning
Graphical Models, Exponential Families, and Variational Inference

Foundations and Trends® in Machine Learning
Lattice-based optimization of sequence classification criteria for neural-network acoustic modeling

ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
Single-channel speech separation and recognition using loopy belief propagation

ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
Monaural speech separation and recognition challenge

Computer Speech and Language
Super-human multi-talker speech recognition: A graphical modeling approach

Computer Speech and Language
Vector quantization for the efficient computation of continuous density likelihoods

ICASSP'93 Proceedings of the 1993 IEEE international conference on Acoustics, speech, and signal processing: speech processing - Volume II
Image classification by a two-dimensional hidden Markov model

IEEE Transactions on Signal Processing
Template-Based Continuous Speech Recognition

IEEE Transactions on Audio, Speech, and Language Processing
Factor graphs and the sum-product algorithm

IEEE Transactions on Information Theory

Quantified Score

Hi-index	0.00

Visualization

Abstract

One of the earliest successful applications of machine-learning techniques to pattern recognition was the application of information-theoretic principles to speech recognition. Previous approaches relied heavily on expert input through the painstaking analysis of data to relate speech signals to the word sequences that produced them. Such methodologies were completely displaced by casting the speech recognition problem in a probabilistic framework by modeling the joint probability distribution of speech signals and word sequences. At the beginning of the 21st century, the amount of data and computation to train and build models has increased exponentially, and the emergence of new machine-learning algorithms and methodologies has opened new vistas in approaching complex pattern recognition problems. This is enabled by a new set of machine-learning techniques referred to as graphical models, with computationally tractable training algorithms. Closely related are neural-network modeling techniques, and there has been a resurgence of interest in the application of neural-network concepts, such as deep networks to speech recognition. The explosion of data has caused the development of new ways to capture the key features in massive amounts of data using efficient methods deploying exemplar-based sparse representations. Lastly, all of these different approaches can be tied together in a principled fashion using another variation of graphical models: an exponential model framework. This paper describes the current state of the art in speech recognition systems and highlights the developments that are expected to produce major breakthroughs in our ability to automatically recognize speech using computers.