Dominance detection in a reverberated acoustic scenario

Authors:
Emanuele Principi;Rudy Rotili;Martin Wöllmer;Stefano Squartini;Björn Schuller
Affiliations:
Dipartimento di Ingegneria dell'Informazione, Università Politecnica delle Marche, Ancona, Italy;Dipartimento di Ingegneria dell'Informazione, Università Politecnica delle Marche, Ancona, Italy;Institute for Human-Machine Communication, Technische Universität München, Germany;Dipartimento di Ingegneria dell'Informazione, Università Politecnica delle Marche, Ancona, Italy;Institute for Human-Machine Communication, Technische Universität München, Germany
Venue:
ISNN'12 Proceedings of the 9th international conference on Advances in Neural Networks - Volume Part I
Year:
2012

Citing 12
Cited 0

2005 Special Issue: Framewise phoneme classification with bidirectional LSTM and other neural network architectures

Neural Networks - 2005 Special issue: IJCNN 2005
Long Short-Term Memory

Neural Computation
Multi-modal activity and dominance detection in smart meeting rooms

ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
Automatic nonverbal analysis of social interaction in small groups: A review

Image and Vision Computing
Modeling dominance in group conversations using nonverbal activity cues

IEEE Transactions on Audio, Speech, and Language Processing - Special issue on multimodal processing in speech-based interactions
Opensmile: the munich versatile and fast open-source audio feature extractor

Proceedings of the international conference on Multimedia
Fusing Audio-Visual Nonverbal Cues to Detect Dominant People in Group Conversations

ICPR '10 Proceedings of the 2010 20th International Conference on Pattern Recognition
AVEC 2011-the first international audio/visual emotion challenge

ACII'11 Proceedings of the 4th international conference on Affective computing and intelligent interaction - Volume Part II
A real-time speech enhancement framework for multi-party meetings

NOLISP'11 Proceedings of the 5th international conference on Advances in nonlinear speech processing
The AMI meeting corpus: a pre-announcement

MLMI'05 Proceedings of the Second international conference on Machine Learning for Multimodal Interaction
A class of frequency-domain adaptive approaches to blind multichannel identification

IEEE Transactions on Signal Processing
Estimating Dominance in Multi-Party Meetings Using Speaker Diarization

IEEE Transactions on Audio, Speech, and Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This work proposes a dominance detection framework operating in reverberated environments. The framework is composed of a speech enhancement front-end, which automatically reduces the distortions introduced by room reverberation in the speech signals, and a dominance detector, which processes the enhanced signals and estimates the most and least dominant person in a segment. The front-end is composed by three cooperating blocks: speaker diarization, room impulse responses identification and speech dereverberation. The dominance estimation algorithm is based on bidirectional Long Short-Term Memory networks which allow for context-sensitive activity classification from audio feature functionals extracted via the real-time speech feature extraction toolkit openSMILE. Experiments have been performed suitably reverberating the DOME dataset: the absolute accuracy improvement averaged over the addressed reverberated conditions is 32.68% in the most dominant person estimation task and 36.56% in the least dominant person estimation one, both with full agreement among annotators.