Microphone arrays and neural networks for robust speech recognition

Authors:
C. Che;Q. Lin;J. Pearson;B. de Vries;J. Flanagan
Affiliations:
Rutgers University, Piscataway, NJ;Rutgers University, Piscataway, NJ;David Sarnoff Research Center, Princeton, NJ;David Sarnoff Research Center, Princeton, NJ;Rutgers University, Piscataway, NJ
Venue:
HLT '94 Proceedings of the workshop on Human Language Technology
Year:
1994

Citing 3
Cited 1

Cepstral parameter compensation for HMM recognition in noise

Speech Communication - Special issue on speech processing in adverse conditions
Spatially selective sound capture for speech and audio processing

Speech Communication - Special issue: Fujisaki's Festschrift
Automatic Speech Recognition: The Development of the Sphinx Recognition System

Automatic Speech Recognition: The Development of the Sphinx Recognition System

A Neural Network Based Regression Approach for Recognizing Simultaneous Speech

MLMI '08 Proceedings of the 5th international workshop on Machine Learning for Multimodal Interaction

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper explores use of synergistically-integrated systems of microphone arrays and neural networks for robust speech recognition in variable acoustic environments, where the user must not be encumbered by microphone equipment. Existing speech recognizers work best for "high-quality close-talking speech." Performance of these recognizers is typically degraded by environmental interference and mismatch in training conditions and testing conditions. It is found that use of microphone arrays and neural network processors can elevate the recognition performance of existing speech recognizers in an adverse acoustic environment, thus avoiding the need to retrain the recognizer, a complex and tedious task. We also present results showing that a system of microphone arrays and neural networks can achieve a higher word recognition accuracy in an unmatched training/testing condition than that obtained with a retrained speech recognizer using array speech for both training and testing, i.e., a matched training/testing condition.