Use of Microphone Array and Model Adaptation for Hands-Free Speech Acquisition and Recognition

Authors:
Jen-Tzung Chien;Jain-Ray Lai
Affiliations:
Department of Computer Science and Information Engineering, National Cheng Kung University, Tainan, 70101 Taiwan, Republic of China;Department of Computer Science and Information Engineering, National Cheng Kung University, Tainan, 70101 Taiwan, Republic of China
Venue:
Journal of VLSI Signal Processing Systems
Year:
2004

Citing 6
Cited 1

A microphone array processing technique for speech enhancement in a reverberant space

Speech Communication
Unsupervised hierarchical adaptation using reliable selection of cluster-dependent parameters

Speech Communication
Array Signal Processing: Concepts and Techniques

Array Signal Processing: Concepts and Techniques
Microphone-array speech recognition via incremental map training

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 02
Speech enhancement using nonlinear microphone array with complementary beamforming

ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 01
Localization of multiple sound sources based on a CSP analysis with a microphone array

ICASSP '00 Proceedings of the Acoustics, Speech, and Signal Processing, 2000. on IEEE International Conference - Volume 02

Human-centered visualization environments

Human-centered visualization environments

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a combined microphone array and model adaptation algorithm for hands-free speech recognition. Our purpose is to remove the inconvenience of using head-mounted/hand-holding microphone in conventional speech recognizer. To improve the speech quality with car noise interference, a linear microphone array is applied and acted as robust acquisition system. A time-domain coherence measure (TDCM) is applied to reliably estimate the time delay for speech signals collected by different microphones. The estimated delay is adopted in a delay-and-sum beamformer for speech enhancement. Further, we adapt the speech hidden Markov models to get close to the acoustic conditions of the enhanced test speech for robust speech recognition. In acquisition and recognition experiments using connected Chinese digits, we found that TDCM can effectively estimate the time delay. The increase in the speech sampling rate is helpful to determine the time delay. Incorporating the model adaptation scheme significantly reduces the recognition errors with moderate computation overhead.