Comparison of segmentation and clustering methods for speaker diarization of broadcast stream audio

Authors:
Jan Prazak;Jan Silovsky
Affiliations:
Institute of Information Technology and Electronics, Faculty of Mechatronics, Technical University of Liberec, Liberec, Czech Republic;Institute of Information Technology and Electronics, Faculty of Mechatronics, Technical University of Liberec, Liberec, Czech Republic
Venue:
COST'10 Proceedings of the 2010 international conference on Analysis of Verbal and Nonverbal Communication and Enactment
Year:
2010

Citing 1
Cited 0

An overview of automatic speaker diarization systems

IEEE Transactions on Audio, Speech, and Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper investigates various approaches to segmentation of media streams into speaker homogenous segments and approaches to clustering of speakers within a speaker diarization system for processing of broadcast audio. Evaluated segmentation approaches are all based on the widely used Bayesian Information Criterion (BIC). They differ in a strategy for choice of the length of the window (fixed or variable) and in a strategy for estimation of the decision threshold (fixed or adaptive). Further, we compare two bottom-up clustering approaches. The traditional BIC-based clustering is compared with the approach based on a measure of the distance between GMMs estimated for the data of clusters by the Maximum A Posteriori (MAP) adaptation.