LISTEN: A System for Locating and Tracking Individual Speakers

Authors:
M. Collobert;R. Feraud;G. Le Tourneur;O. Bernier;J. E. Viallet;Y. Mahieux;D. Collobert
Affiliations:
-;-;-;-;-;-;-
Venue:
FG '96 Proceedings of the 2nd International Conference on Automatic Face and Gesture Recognition (FG '96)
Year:
1996

Citing 0
Cited 10

On face detection in the compressed domain

MULTIMEDIA '00 Proceedings of the eighth ACM international conference on Multimedia
A Fast and Accurate Face Detector Based on Neural Networks

IEEE Transactions on Pattern Analysis and Machine Intelligence
Detecting Faces in Images: A Survey

IEEE Transactions on Pattern Analysis and Machine Intelligence
A Constrained Generative Model Applied to Face Detection

Neural Processing Letters
Hand posture recognition in a body-face centered space

CHI '99 Extended Abstracts on Human Factors in Computing Systems
Hand Posture Recognition in a Body-Face Centered Space

GW '99 Proceedings of the International Gesture Workshop on Gesture-Based Communication in Human-Computer Interaction
Head tracking using stereo

Machine Vision and Applications - Special issue: IEEE WACV
Audiovisual Arrays for Untethered Spoken Interfaces

ICMI '02 Proceedings of the 4th IEEE International Conference on Multimodal Interfaces
Audio-video array source separation for perceptual user interfaces

Proceedings of the 2001 workshop on Perceptive user interfaces
Tracking multiple people with recovery from partial and total occlusion

Pattern Recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

Both visual and acoustical informations provide effective means of telecommunication between persons. In this context, the face is the most important part of the person both visually and acoustically. We describe how the cooperation of image and audio processing allows to track a person's face and to collect the audio information it produces. We present detection techniques of regions of interest (e.g. moving regions of skin color), coupled with a neural network based face detector with a low false alarm rate, to locate and track faces. The system is connected to a nine microphone array adaptive beamforming which performs immediate beamforming. Visual and acoustical informations from the speaker face are thus obtained in real time.