Comparison of segmentation and clustering methods for speaker diarization of broadcast stream audio

  • Authors:
  • Jan Prazak;Jan Silovsky

  • Affiliations:
  • Institute of Information Technology and Electronics, Faculty of Mechatronics, Technical University of Liberec, Liberec, Czech Republic;Institute of Information Technology and Electronics, Faculty of Mechatronics, Technical University of Liberec, Liberec, Czech Republic

  • Venue:
  • COST'10 Proceedings of the 2010 international conference on Analysis of Verbal and Nonverbal Communication and Enactment
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper investigates various approaches to segmentation of media streams into speaker homogenous segments and approaches to clustering of speakers within a speaker diarization system for processing of broadcast audio. Evaluated segmentation approaches are all based on the widely used Bayesian Information Criterion (BIC). They differ in a strategy for choice of the length of the window (fixed or variable) and in a strategy for estimation of the decision threshold (fixed or adaptive). Further, we compare two bottom-up clustering approaches. The traditional BIC-based clustering is compared with the approach based on a measure of the distance between GMMs estimated for the data of clusters by the Maximum A Posteriori (MAP) adaptation.