A vector Taylor series approach for environment-independent speech recognition

Authors:
P. J. Moreno;B. Raj;R. M. Stern
Affiliations:
Dept. of Electr. & Comput. Eng., Carnegie Mellon Univ., Pittsburgh, PA, USA;-;-
Venue:
ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 02
Year:
1996

Citing 0
Cited 34

Bayesian Noise Compensation of Time Trajectories of Spectral Coefficients for Robust Speech Recognition

TSD '01 Proceedings of the 4th International Conference on Text, Speech and Dialogue
Model-Based Feature Compensation for Robust Speech Recognition

Fundamenta Informaticae
Model compensation approach based on nonuniform spectral compression features for noisy speech recognition

EURASIP Journal on Applied Signal Processing
The application of hidden Markov models in speech recognition

Foundations and Trends in Signal Processing
Stochastic vector mapping-based feature enhancement using prior-models and model adaptation for noisy speech recognition

Speech Communication
Recognition of noisy speech: a comparative survey of robust model architecture and feature enhancement

EURASIP Journal on Audio, Speech, and Music Processing
Enhanced speech features by single-channel joint compensation of noise and reverberation

IEEE Transactions on Audio, Speech, and Language Processing
Stereo-based stochastic mapping for robust speech recognition

IEEE Transactions on Audio, Speech, and Language Processing
The synergy between bounded-distance HMM and spectral subtraction for robust speech recognition

Speech Communication
Mixed environment compensation based on maximum a posteriori estimation for robust speech recognition

Artificial Intelligence Review
Uncertainty decoding on Frequency Filtered parameters for robust ASR

Speech Communication
Mask classification for missing-feature reconstruction for robust speech recognition in unknown background noise

Speech Communication
Noise adaptive training for robust automatic speech recognition

IEEE Transactions on Audio, Speech, and Language Processing
Histogram equalization to model adaptation for robust speech recognition

EURASIP Journal on Advances in Signal Processing
MMSE estimation of log-filterbank energies for robust speech recognition

Speech Communication
Bayesian marginal statistics for speech enhancement using log Gabor wavelet

International Journal of Speech Technology
Acoustic modeling problem for automatic speech recognition system: advances and refinements (Part II)

International Journal of Speech Technology
Trends and advances in speech recognition

IBM Journal of Research and Development
Non-stationary environment compensation using sequential EM algorithm for robust speech recognition

PKDD'05 Proceedings of the 9th European conference on Principles and Practice of Knowledge Discovery in Databases
HMM parameter adaptation using the truncated first-order VTS and EM algorithm for robust speech recognition

CIS'05 Proceedings of the 2005 international conference on Computational Intelligence and Security - Volume Part I
Robust mandarin speech recognition for car navigation interface

PCM'06 Proceedings of the 7th Pacific Rim conference on Advances in Multimedia Information Processing
An HMM compensation approach using unscented transformation for noisy speech recognition

ISCSLP'06 Proceedings of the 5th international conference on Chinese Spoken Language Processing
Environment compensation based on maximum a posteriori estimation for improved speech recognition

MICAI'05 Proceedings of the 4th Mexican international conference on Advances in Artificial Intelligence
Feature extraction based on wavelet domain hidden markov tree model for robust speech recognition

AI'04 Proceedings of the 17th Australian joint conference on Advances in Artificial Intelligence
Efficient SNR driven SPLICE implementation for robust speech recognition

COST'10 Proceedings of the 2010 international conference on Analysis of Verbal and Nonverbal Communication and Enactment
Model-Based Feature Compensation for Robust Speech Recognition

Fundamenta Informaticae
Stereo hidden Markov modeling for noise robust speech recognition

Computer Speech and Language
Acoustic analysis and feature transformation from neutral to whisper for speaker identification within whispered speech audio streams

Speech Communication
Blind source extraction for robust speech recognition in multisource noisy environments

Computer Speech and Language
Speech recognition in living rooms: Integrated speech enhancement and recognition system based on spatial, spectral and temporal modeling of sounds

Computer Speech and Language
Uncertainty-based learning of acoustic models from noisy data

Computer Speech and Language
Robust Feature Vector Set Using Higher Order Autocorrelation Coefficients

International Journal of Cognitive Informatics and Natural Intelligence
Prior-shared feature and model space speaker adaptation by consistently employing map estimation

Speech Communication
Feature normalization based on non-extensive statistics for speech recognition

Speech Communication

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we introduce a new analytical approach to environment compensation for speech recognition. Previous attempts at solving analytically the problem of noisy speech recognition have either used an overly-simplified mathematical description of the effects of noise on the statistics of speech or they have relied on the availability of large environment-specific adaptation sets. Some of the previous methods required the use of adaptation data that consists of simultaneously-recorded or "stereo" recordings of clean and degraded speech. In this work we introduce the use of a vector Taylor series (VTS) expansion to characterize efficiently and accurately the effects on speech statistics of unknown additive noise and unknown linear filtering in a transmission channel. The VTS approach is computationally efficient. It can be applied either to the incoming speech feature vectors, or to the statistics representing these vectors. In the first case the speech is compensated and then recognized; in the second case HMM statistics are modified using the VTS formulation. Both approaches use only the actual speech segment being recognized to compute the parameters required for environmental compensation. We evaluate the performance of two implementations of VTS algorithms using the CMU SPHINX-II system on the 100-word alphanumeric CENSUS database and on the 1993 5000-word ARPA Wall Street Journal database. Artificial white Gaussian noise is added to both databases. The VTS approaches provide significant improvements in recognition accuracy compared to previous algorithms.