Location based speaker segmentation

Authors:
G. Lathoud;I. A. McCowan
Affiliations:
Dalle Molle Inst. for Perceptual Artificial Intelligence, Martigny, Switzerland;Dalle Molle Inst. for Perceptual Artificial Intelligence, Martigny, Switzerland
Venue:
ICME '03 Proceedings of the 2003 International Conference on Multimedia and Expo - Volume 3 (ICME '03) - Volume 03
Year:
2003

Citing 0
Cited 2

Browsing recorded meetings with ferret

MLMI'04 Proceedings of the First international conference on Machine Learning for Multimodal Interaction
Towards computer understanding of human interactions

MLMI'04 Proceedings of the First international conference on Machine Learning for Multimodal Interaction

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper proposes a technique that segments audio according to speakers based on their location. In many multi-party conversations, such as meetings, the location of participants is restricted to a small number of regions, such as seats around a table, or at a whiteboard. In such cases, segmentation according to these discrete regions would be a reliable means of determining speaker turns. We propose a system that uses microphone pair time delays as features to represent speaker locations. These features are integrated in a GMM/HMM framework to determine an optimal segmentation of the audio according to location. The HMM framework also allows extensions to recognize more complex structure, such as the presence of two simultaneous speakers. Experiments testing the system on real recordings from a meeting room show that the proposed location features can provide greater discrimination than standard cepstral features, and also demonstrate the success of an extension to handle dual-speaker overlap.