Estimating Dominance in Multi-Party Meetings Using Speaker Diarization

Authors:
H. Hung;Yan Huang;G. Friedland;D. Gatica-Perez
Affiliations:
Univ. of Amsterdam, Amsterdam, Netherlands;-;-;-
Venue:
IEEE Transactions on Audio, Speech, and Language Processing
Year:
2011

Citing 0
Cited 9

Dialocalization: Acoustic speaker diarization and visual localization as joint optimization problem

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
Generative modeling and classification of dialogs by a low-level turn-taking feature

Pattern Recognition
Characterization of coordination in an imitation task: human evaluation and automatically computable cues

ICMI '11 Proceedings of the 13th international conference on multimodal interfaces
A real-time speech enhancement framework for multi-party meetings

NOLISP'11 Proceedings of the 5th international conference on Advances in nonlinear speech processing
Embodied cooperation using mobile devices: presenting and evaluating the Sync4All application

Proceedings of the International Working Conference on Advanced Visual Interfaces
Dominance detection in a reverberated acoustic scenario

ISNN'12 Proceedings of the 9th international conference on Advances in Neural Networks - Volume Part I
Multimodal prediction of expertise and leadership in learning groups

Proceedings of the 1st International Workshop on Multimodal Learning Analytics
SocioPhone: everyday face-to-face interaction monitoring platform using multi-phone sensor fusion

Proceeding of the 11th annual international conference on Mobile systems, applications, and services
TalkBetter: family-driven mobile intervention care for children with language delay

Proceedings of the 17th ACM conference on Computer supported cooperative work & social computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

With the increase in cheap commercially available sensors, recording meetings is becoming an increasingly practical option. With this trend comes the need to summarize the recorded data in semantically meaningful ways. Here, we investigate the task of automatically measuring dominance in small group meetings when only a single audio source is available. Past research has found that speaking length as a single feature, provides a very good estimate of dominance. For these tasks we use speaker segmentations generated by our automated faster than real-time speaker diarization algorithm, where the number of speakers is not known beforehand. From user-annotated data, we analyze how the inherent variability of the annotations affects the performance of our dominance estimation method. We primarily focus on examining of how the performance of the speaker diarization and our dominance tasks vary under different experimental conditions and computationally efficient strategies, and how this would impact on a practical implementation of such a system. Despite the use of a state-of-the-art speaker diarization algorithm, speaker segments can be noisy. On conducting experiments on almost 5 hours of audio-visual meeting data, our results show that the dominance estimation is robust to increasing diarization noise.