Voice activity detection using audio-visual information

Authors:
Theodoros Petsatodis;Aristodemos Pnevmatikakis;Christos Boukis
Affiliations:
University of Aalborg, CTiF, and Athens Information Technology;Athens Information Technology, Peania, Greece;Athens Information Technology, Peania, Greece
Venue:
DSP'09 Proceedings of the 16th international conference on Digital Signal Processing
Year:
2009

Citing 3
Cited 0

On Combining Classifiers

IEEE Transactions on Pattern Analysis and Machine Intelligence
Visual voice activity detection as a help for speech source separation from convolutive mixtures

Speech Communication
Statistical voice activity detection using low-variance spectrum estimation and an adaptive threshold

IEEE Transactions on Audio, Speech, and Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

An audio-visual voice activity detector that uses sensors positioned distantly from the speaker is presented. Its constituting unimodal detectors are based on the modeling of the temporal variation of audio and visual features using Hidden Markov Models; their outcomes are fused using a postdecision scheme. The Mel-Frequency Cepstral Coefficients and the vertical mouth opening are the chosen audio and visual features respectively, both augmented with their first-order derivatives. The proposed system is assessed using farfield recordings from four different speakers and under various levels of additive white Gaussian noise, to obtain a performance superior than that which each unimodal component alone can achieve.