Neural Networks - 2005 Special issue: IJCNN 2005
Neural Computation
Multi-modal activity and dominance detection in smart meeting rooms
ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
Automatic nonverbal analysis of social interaction in small groups: A review
Image and Vision Computing
Modeling dominance in group conversations using nonverbal activity cues
IEEE Transactions on Audio, Speech, and Language Processing - Special issue on multimodal processing in speech-based interactions
Opensmile: the munich versatile and fast open-source audio feature extractor
Proceedings of the international conference on Multimedia
Fusing Audio-Visual Nonverbal Cues to Detect Dominant People in Group Conversations
ICPR '10 Proceedings of the 2010 20th International Conference on Pattern Recognition
AVEC 2011-the first international audio/visual emotion challenge
ACII'11 Proceedings of the 4th international conference on Affective computing and intelligent interaction - Volume Part II
A real-time speech enhancement framework for multi-party meetings
NOLISP'11 Proceedings of the 5th international conference on Advances in nonlinear speech processing
The AMI meeting corpus: a pre-announcement
MLMI'05 Proceedings of the Second international conference on Machine Learning for Multimodal Interaction
A class of frequency-domain adaptive approaches to blind multichannel identification
IEEE Transactions on Signal Processing
Estimating Dominance in Multi-Party Meetings Using Speaker Diarization
IEEE Transactions on Audio, Speech, and Language Processing
Hi-index | 0.00 |
This work proposes a dominance detection framework operating in reverberated environments. The framework is composed of a speech enhancement front-end, which automatically reduces the distortions introduced by room reverberation in the speech signals, and a dominance detector, which processes the enhanced signals and estimates the most and least dominant person in a segment. The front-end is composed by three cooperating blocks: speaker diarization, room impulse responses identification and speech dereverberation. The dominance estimation algorithm is based on bidirectional Long Short-Term Memory networks which allow for context-sensitive activity classification from audio feature functionals extracted via the real-time speech feature extraction toolkit openSMILE. Experiments have been performed suitably reverberating the DOME dataset: the absolute accuracy improvement averaged over the addressed reverberated conditions is 32.68% in the most dominant person estimation task and 36.56% in the least dominant person estimation one, both with full agreement among annotators.