A robust method to extract talker azimuth orientation using a large-aperture microphone array

Authors:
Avram Levi;Harvey Silverman
Affiliations:
Laboratory for Engineering Man/Machine Systems, Brown University, Providence, RI;Laboratory for Engineering Man/Machine Systems, Brown University, Providence, RI
Venue:
IEEE Transactions on Audio, Speech, and Language Processing
Year:
2010

Citing 4
Cited 1

The Huge Microphone Array

IEEE Concurrency
The Huge Microphone Array, Part 2

IEEE Concurrency
Active Face Tracking and Pose Estimation in an Interactive Room

CVPR '96 Proceedings of the 1996 Conference on Computer Vision and Pattern Recognition (CVPR '96)
A framework for speech source localization using sensor arrays

A framework for speech source localization using sensor arrays

Directional acoustic source orientation estimation using only two microphones

Digital Signal Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Knowing the orientation of a talker in the focal area of a large-aperture microphone array enables the development of better beamforming algorithms (to obtain higher-quality speech output), improves source-location/tracking algorithms, and allows better selection and control of cameras in a video conference situation. Measurements in an anechoic room (e.g., Chu and Warnock, 2002) have quantified the average frequency-dependent magnitude (source radiation pattern) of the human speech source showing a front-to-back difference in magnitude that increases with frequency by about 8 dB/decade reaching about 18 dB at 8000 Hz. These amplitude differences, while severely masked by both coherent and noncoherent noise in a real environment, are the most extractable phenomena from a talker's orientation when compared to other phenomena such as phase differences due to the source or effects due to diffraction at the mouth. In this paper, we propose a robust, source-radiation-pattern-based method for extraction of the azimuth angle of a single talker for whom an accurate point-source location estimate is known. The method requires no a priori training and has been tested in more than 100 situations with real human talkers having various locations and orientations in a room equipped with a large aperture microphone array. We compare these results against earlier published algorithms and find that the method proposed herein is the most robust and is sufficient to be considered for a real time system.