An overview of automatic speaker diarization systems
IEEE Transactions on Audio, Speech, and Language Processing
Hi-index | 0.00 |
This paper investigates various approaches to segmentation of media streams into speaker homogenous segments and approaches to clustering of speakers within a speaker diarization system for processing of broadcast audio. Evaluated segmentation approaches are all based on the widely used Bayesian Information Criterion (BIC). They differ in a strategy for choice of the length of the window (fixed or variable) and in a strategy for estimation of the decision threshold (fixed or adaptive). Further, we compare two bottom-up clustering approaches. The traditional BIC-based clustering is compared with the approach based on a measure of the distance between GMMs estimated for the data of clusters by the Maximum A Posteriori (MAP) adaptation.