Dialocalization: Acoustic speaker diarization and visual localization as joint optimization problem
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
Multimodal location estimation
Proceedings of the international conference on Multimedia
Tuning-robust initialization methods for speaker diarization
IEEE Transactions on Audio, Speech, and Language Processing
Multistream speaker diarization of meetings recordings beyond MFCC and TDOA features
Speech Communication
Sherlock holmes' evil twin: on the impact of global inference for online privacy
Proceedings of the 2011 workshop on New security paradigms workshop
A review on speaker diarization systems and approaches
Speech Communication
Hi-index | 0.00 |
Speaker diarization is defined as the task of determining ldquowho spoke whenrdquo given an audio track and no other prior knowledge of any kind. The following article shows how a state-of-the-art speaker diarization system can be improved by combining traditional short-term features (MFCCs) with prosodic and other long-term features. First, we present a framework to study the speaker discriminability of 70 different long-term features. Then, we show how the top-ranked long-term features can be combined with short-term features to increase the accuracy of speaker diarization. The results were measured on standardized datasets (NIST RT) and show a consistent improvement of about 30% relative in diarization error rate compared to the best system presented at the NIST evaluation in 2007.