Multistage speaker diarization of broadcast news

Authors:
C. Barras;Xuan Zhu;S. Meignier;J. -L. Gauvain
Affiliations:
Eng. Sci.-Nat. Center for Sci. Res., LIMSI-CNRS, Orsay;-;-;-
Venue:
IEEE Transactions on Audio, Speech, and Language Processing
Year:
2006

Citing 0
Cited 22

Review: Speaker segmentation and clustering

Signal Processing
Speaker diarization using one-class support vector machines

Speech Communication
The LIMSI RT07 Lecture Transcription System

Multimodal Technologies for Perception of Humans
Progress in the AMIDA Speaker Diarization System for Meeting Data

Multimodal Technologies for Perception of Humans
Multi-stage Speaker Diarization for Conference and Lecture Meetings

Multimodal Technologies for Perception of Humans
Speech Processing for Audio Indexing

GoTAL '08 Proceedings of the 6th international conference on Advances in Natural Language Processing
Spherical discriminant analysis in semi-supervised speaker clustering

NAACL-Short '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers
Improved features and models for detecting edit disfluencies in transcribing spontaneous Mandarin speech

IEEE Transactions on Audio, Speech, and Language Processing
An information theoretic approach to speaker diarization of meeting data

IEEE Transactions on Audio, Speech, and Language Processing
Locality preserving speaker clustering

ICME'09 Proceedings of the 2009 IEEE international conference on Multimedia and Expo
Identification of Soundbite and Its Speaker Name Using Transcripts of Broadcast News Speech

ACM Transactions on Asian Language Information Processing (TALIP)
BIC-based speaker segmentation using divide-and-conquer strategies with application to speaker diarization

IEEE Transactions on Audio, Speech, and Language Processing
Speaker diarization exploiting the eigengap criterion and cluster ensembles

IEEE Transactions on Audio, Speech, and Language Processing
Discrimination of speech from nonspeeech in broadcast news based on modulation frequency features

Speech Communication
Speaker diarization: from broadcast news to lectures

MLMI'06 Proceedings of the Third international conference on Machine Learning for Multimodal Interaction
A review on speaker diarization systems and approaches

Speech Communication
Hierarchical framework for plot de-interlacing of TV series based on speakers, dialogues and images

Proceedings of the 2012 ACM international workshop on Audio and multimedia methods for large-scale video analysis
Fusion of speech, faces and text for person identification in TV broadcast

ECCV'12 Proceedings of the 12th international conference on Computer Vision - Volume Part III
Eigenvoice modelling for cross likelihood ratio based speaker clustering: A Bayesian approach

Computer Speech and Language
SocioPhone: everyday face-to-face interaction monitoring platform using multi-phone sensor fusion

Proceeding of the 11th annual international conference on Mobile systems, applications, and services
A unified framework for domain independent online speaker indexing in eigen-voice space using an index tree of reference models

International Journal of Speech Technology
Spontaneous speech and opinion detection: mining call-centre transcripts

Language Resources and Evaluation

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes recent advances in speaker diarization with a multistage segmentation and clustering system, which incorporates a speaker identification step. This system builds upon the baseline audio partitioner used in the LIMSI broadcast news transcription system. The baseline partitioner provides a high cluster purity, but has a tendency to split data from speakers with a large quantity of data into several segment clusters. Several improvements to the baseline system have been made. First, the iterative Gaussian mixture model (GMM) clustering has been replaced by a Bayesian information criterion (BIC) agglomerative clustering. Second, an additional clustering stage has been added, using a GMM-based speaker identification method. Finally, a post-processing stage refines the segment boundaries using the output of a transcription system. On the National Institute of Standards and Technology (NIST) RT-04F and ESTER evaluation data, the multistage system reduces the speaker error by over 70% relative to the baseline system, and gives between 40% and 50% reduction relative to a single-stage BIC clustering system