Preprocessing of independent vector analysis using feed-forward network for robust speech recognition

Authors:
Myungwoo Oh;Hyung-Min Park
Affiliations:
Department of Electronic Engineering, Sogang University, Seoul, Republic of Korea;Department of Electronic Engineering, Sogang University, Seoul, Republic of Korea
Venue:
ICONIP'11 Proceedings of the 18th international conference on Neural Information Processing - Volume Part II
Year:
2011

Citing 5
Cited 0

Fundamentals of speech recognition

Fundamentals of speech recognition
Assessment for automatic speech recognition II: NOISEX-92: a database and an experiment to study the effect of additive noise on speech recognition systems

Speech Communication - Special issue on speech processing in adverse conditions
The Effects of Background Music on Speech Recognition Accuracy

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 2 - Volume 2
Blind Source Separation Exploiting Higher-Order Frequency Dependencies

IEEE Transactions on Audio, Speech, and Language Processing
Letters: Blind source separation based on independent vector analysis using feed-forward network

Neurocomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes an algorithm to preprocess independent vector analysis (IVA) using feed-forward network for robust speech recognition. In the framework of IVA, a feed-forward network is able to be used as an separating system to accomplish successful separation of highly reverberated mixtures. For robust speech recognition, we make use of the cluster-based missing feature reconstruction based on log-spectral features of separated speech in the process of extracting mel-frequency cepstral coefficients. The algorithm identifies corrupted time-frequency segments with low signal-to-noise ratios calculated from the log-spectral features of the separated speech and observed noisy speech. The corrupted segments are filled by employing bounded estimation based on the possibly reliable log-spectral features and on the knowledge of the pre-trained log-spectral feature clusters. Experimental results demonstrate that the proposed method enhances recognition performance in noisy environments significantly.