Audio-visual intent-to-speak detection for human-computer interaction

Authors:
P. De Cuetos;C. Neti;A. W. Senior
Affiliations:
Inst. Eurecom, Sophia-Antipolis, France;-;-
Venue:
ICASSP '00 Proceedings of the Acoustics, Speech, and Signal Processing, 2000. on IEEE International Conference - Volume 04
Year:
2000

Citing 0
Cited 4

Audio-Visual Speech Recognition One Pass Learning with Spiking Neurons

ICANN '02 Proceedings of the International Conference on Artificial Neural Networks
Efficient retrieval of life log based on context and content

Proceedings of the the 1st ACM workshop on Continuous archival and retrieval of personal experiences
Visual lip activity detection and speaker detection using mouth region intensities

IEEE Transactions on Circuits and Systems for Video Technology
Novel concept for video retrieval in life log application

PCM'04 Proceedings of the 5th Pacific Rim Conference on Advances in Multimedia Information Processing - Volume Part II

Quantified Score

Hi-index	0.00

Visualization

Abstract

Introduces a practical system that aims to detect a user's intent to speak to a computer, by considering both audio and visual cues. The whole system is designed to intuitively turn on the microphone for speech recognition without needing to click on a mouse, thus improving the human-like communication between users and computers. The first step is to detect a frontal face through a simple desktop video camera image, by using some well-known image processing techniques for face and facial feature detection on one image. The second step is an audio-visual speech event detection that combines both visual and audio indications of speech. In this paper, we consider visual measures of speech activity as well as audio energy to determine if the previously detected user is actually speaking or not.