MVA Processing of Speech Features

Authors:
Chia-Ping Chen;Jeff A. Bilmes
Affiliations:
Univ. of Washington, Seattle, WA;-
Venue:
IEEE Transactions on Audio, Speech, and Language Processing
Year:
2007

Citing 0
Cited 17

Temporal contrast normalization and edge-preserved smoothing of temporal modulation structures of speech for robust speech recognition

Speech Communication
Content-based music genre classification using timbral feature vectors and support vector machine

Proceedings of the 2nd International Conference on Interaction Sciences: Information Technology, Culture and Human
Normalization on the modulation spectrum of the subband temporal envelopes for automatic speech recognition in reverberant environments

Proceedings of the 3rd International Universal Communication Symposium
A study on the generalization capability of acoustic models for robust speech recognition

IEEE Transactions on Audio, Speech, and Language Processing
Sub-band temporal modulation envelopes and their normalization for automatic speech recognition in reverberant environments

Computer Speech and Language
Temporal modulation normalization for robust speech feature extraction and recognition

Multimedia Tools and Applications
Robust speech recognition using spatial-temporal feature distribution characteristics

Pattern Recognition Letters
Compensating the speech features via discrete cosine transform for robust speech recognition

ROCLING '11 Proceedings of the 23rd Conference on Computational Linguistics and Speech Processing
Probabilistic modulation spectrum factorization for robust speech recognition

ROCLING '11 ROCLING 2011 Poster Papers
Fast communication: Improved modulation spectrum enhancement methods for robust speech recognition

Signal Processing
Enhancing robustness for speech recognition through bio-inspired auditory filter-bank

International Journal of Bio-Inspired Computation
An improved model of masking effects for robust speech recognition system

Speech Communication
Isolated Word Speech Rcogniton Based on HRSF and Improved DTW Algorithm

WI-IAT '12 Proceedings of the The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Volume 03
Fusion of parametric and non-parametric approaches to noise-robust ASR

Speech Communication
Recognizing the message and the messenger: biomimetic spectral analysis for robust speech and speaker recognition

International Journal of Speech Technology
Noise-robust speech recognition through auditory feature detection and spike sequence decoding

Neural Computation
A nonlinear autoregressive model for speaker verification

International Journal of Speech Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we investigate a technique consisting of mean subtraction, variance normalization and time sequence filtering. Unlike other techniques, it applies auto-regression moving-average (ARMA) filtering directly in the cepstral domain. We call this technique mean subtraction, variance normalization, and ARMA filtering (MVA) post-processing, and speech features with MVA post-processing are called MVA features. Overall, compared to raw features without post-processing, MVA features achieve an error rate reduction of 45% on matched tasks and 65% on mismatched tasks on the Aurora 2.0 noisy speech database, and an average 57% error reduction on the Aurora 3.0 database. These improvements are comparable to the results of much more complicated techniques even though MVA is relatively simple and requires practically no additional computational cost. In this paper, in addition to describing MVA processing, we also present a novel analysis of the distortion of mel-frequency cepstral coefficients and the log energy in the presence of different types of noise. The effectiveness of MVA is extensively investigated with respect to several variations: the configurations used to extract and the type of raw features, the domains where MVA is applied, the filters that are used, the ARMA filter orders, and the causality of the normalization process. Specifically, it is argued and demonstrated that MVA works better when applied to the zeroth-order cepstral coefficient than to log energy, that MVA works better in the cepstral domain, that an ARMA filter is better than either a designed finite impulse response filter or a data-driven filter, and that a five-tap ARMA filter is sufficient to achieve good performance in a variety of settings. We also investigate and evaluate a multi-domain MVA generalization