Review: Speaker segmentation and clustering
Signal Processing
Speaker diarization using one-class support vector machines
Speech Communication
The LIMSI RT07 Lecture Transcription System
Multimodal Technologies for Perception of Humans
Progress in the AMIDA Speaker Diarization System for Meeting Data
Multimodal Technologies for Perception of Humans
Multi-stage Speaker Diarization for Conference and Lecture Meetings
Multimodal Technologies for Perception of Humans
Speech Processing for Audio Indexing
GoTAL '08 Proceedings of the 6th international conference on Advances in Natural Language Processing
Spherical discriminant analysis in semi-supervised speaker clustering
NAACL-Short '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers
IEEE Transactions on Audio, Speech, and Language Processing
An information theoretic approach to speaker diarization of meeting data
IEEE Transactions on Audio, Speech, and Language Processing
Locality preserving speaker clustering
ICME'09 Proceedings of the 2009 IEEE international conference on Multimedia and Expo
Identification of Soundbite and Its Speaker Name Using Transcripts of Broadcast News Speech
ACM Transactions on Asian Language Information Processing (TALIP)
IEEE Transactions on Audio, Speech, and Language Processing
Speaker diarization exploiting the eigengap criterion and cluster ensembles
IEEE Transactions on Audio, Speech, and Language Processing
Speaker diarization: from broadcast news to lectures
MLMI'06 Proceedings of the Third international conference on Machine Learning for Multimodal Interaction
A review on speaker diarization systems and approaches
Speech Communication
Hierarchical framework for plot de-interlacing of TV series based on speakers, dialogues and images
Proceedings of the 2012 ACM international workshop on Audio and multimedia methods for large-scale video analysis
Fusion of speech, faces and text for person identification in TV broadcast
ECCV'12 Proceedings of the 12th international conference on Computer Vision - Volume Part III
Eigenvoice modelling for cross likelihood ratio based speaker clustering: A Bayesian approach
Computer Speech and Language
SocioPhone: everyday face-to-face interaction monitoring platform using multi-phone sensor fusion
Proceeding of the 11th annual international conference on Mobile systems, applications, and services
International Journal of Speech Technology
Spontaneous speech and opinion detection: mining call-centre transcripts
Language Resources and Evaluation
Hi-index | 0.00 |
This paper describes recent advances in speaker diarization with a multistage segmentation and clustering system, which incorporates a speaker identification step. This system builds upon the baseline audio partitioner used in the LIMSI broadcast news transcription system. The baseline partitioner provides a high cluster purity, but has a tendency to split data from speakers with a large quantity of data into several segment clusters. Several improvements to the baseline system have been made. First, the iterative Gaussian mixture model (GMM) clustering has been replaced by a Bayesian information criterion (BIC) agglomerative clustering. Second, an additional clustering stage has been added, using a GMM-based speaker identification method. Finally, a post-processing stage refines the segment boundaries using the output of a transcription system. On the National Institute of Standards and Technology (NIST) RT-04F and ESTER evaluation data, the multistage system reduces the speaker error by over 70% relative to the baseline system, and gives between 40% and 50% reduction relative to a single-stage BIC clustering system