An overview of automatic speaker diarization systems

Authors:
S. E. Tranter;D. A. Reynolds
Affiliations:
Dept. of Eng., Cambridge Univ.;-
Venue:
IEEE Transactions on Audio, Speech, and Language Processing
Year:
2006

Citing 0
Cited 40

Speaker Diarization For Multiple-Distant-Microphone Meetings Using Several Sources of Information

IEEE Transactions on Computers
Review: Speaker segmentation and clustering

Signal Processing
Speaker diarization using one-class support vector machines

Speech Communication
The application of hidden Markov models in speech recognition

Foundations and Trends in Signal Processing
Invited paper: Automatic speech recognition: History, methods and challenges

Pattern Recognition
Evolutionary minimization of the Rand index for speaker clustering

Computer Speech and Language
Social signal processing: state-of-the-art and future perspectives of an emerging domain

MM '08 Proceedings of the 16th ACM international conference on Multimedia
Disclosing spoken culture: user interfaces for access to spoken word archives

BCS-HCI '08 Proceedings of the 22nd British HCI Group Annual Conference on People and Computers: Culture, Creativity, Interaction - Volume 1
Speaker diarization using autoassociative neural networks

Engineering Applications of Artificial Intelligence
An Adaptive BIC Approach for Robust Speaker Change Detection in Continuous Audio Streams

TSD '09 Proceedings of the 12th International Conference on Text, Speech and Dialogue
Fusion of Acoustic and Prosodic Features for Speaker Clustering

TSD '09 Proceedings of the 12th International Conference on Text, Speech and Dialogue
Spherical discriminant analysis in semi-supervised speaker clustering

NAACL-Short '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers
Social signal processing: Survey of an emerging domain

Image and Vision Computing
Unfolding speaker clustering potential: a biomimetic approach

MM '09 Proceedings of the 17th ACM international conference on Multimedia
Investigating the use of visual focus of attention for audio-visual speaker diarisation

MM '09 Proceedings of the 17th ACM international conference on Multimedia
An overview of text-independent speaker recognition: From features to supervectors

Speech Communication
A speaker diarization method based on the probabilistic fusion of audio-visual location information

Proceedings of the 2009 international conference on Multimodal interfaces
Improved features and models for detecting edit disfluencies in transcribing spontaneous Mandarin speech

IEEE Transactions on Audio, Speech, and Language Processing
Locality preserving speaker clustering

ICME'09 Proceedings of the 2009 IEEE international conference on Multimedia and Expo
Identification of Soundbite and Its Speaker Name Using Transcripts of Broadcast News Speech

ACM Transactions on Asian Language Information Processing (TALIP)
BIC-based speaker segmentation using divide-and-conquer strategies with application to speaker diarization

IEEE Transactions on Audio, Speech, and Language Processing
Mobile social signal processing: vision and research issues

Proceedings of the 12th international conference on Human computer interaction with mobile devices and services
Speech activity detection for multi-party conversation analyses based on likelihood ratio test on spatial magnitude

IEEE Transactions on Audio, Speech, and Language Processing
Improving speech processing trough social signals: automatic speaker segmentation of political debates using role based turn-taking patterns

Proceedings of the 2nd international workshop on Social signal processing
Multichannel system of audio-visual support of remote mobile participant at e-meeting

ruSMART/NEW2AN'10 Proceedings of the Third conference on Smart Spaces and next generation wired, and 10th international conference on Wireless networking
Speaker diarization exploiting the eigengap criterion and cluster ensembles

IEEE Transactions on Audio, Speech, and Language Processing
Logistic Stick-Breaking Process

The Journal of Machine Learning Research
Discrimination of speech from nonspeeech in broadcast news based on modulation frequency features

Speech Communication
On the use of dot scoring for speaker diarization

IbPRIA'11 Proceedings of the 5th Iberian conference on Pattern recognition and image analysis
Multistream speaker diarization of meetings recordings beyond MFCC and TDOA features

Speech Communication
A comparison of latent variable models for conversation analysis

SIGDIAL '11 Proceedings of the SIGDIAL 2011 Conference
Comparison of segmentation and clustering methods for speaker diarization of broadcast stream audio

COST'10 Proceedings of the 2010 international conference on Analysis of Verbal and Nonverbal Communication and Enactment
Variational conditional random fields for online speaker detection and tracking

Speech Communication
The nonverbal structure of patient case discussions in multidisciplinary medical team meetings

ACM Transactions on Information Systems (TOIS)
A review on speaker diarization systems and approaches

Speech Communication
Spoken Content Retrieval: A Survey of Techniques and Technologies

Foundations and Trends in Information Retrieval
Toward automating a human behavioral coding system for married couples' interactions using speech acoustic features

Speech Communication
Eigenvoice modelling for cross likelihood ratio based speaker clustering: A Bayesian approach

Computer Speech and Language
Singing speaker clustering based on subspace learning in the GMM mean supervector space

Speech Communication
Bayesian nonparametric hidden semi-Markov models

The Journal of Machine Learning Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

Audio diarization is the process of annotating an input audio channel with information that attributes (possibly overlapping) temporal regions of signal energy to their specific sources. These sources can include particular speakers, music, background noise sources, and other signal source/channel characteristics. Diarization can be used for helping speech recognition, facilitating the searching and indexing of audio archives, and increasing the richness of automatic transcriptions, making them more readable. In this paper, we provide an overview of the approaches currently used in a key area of audio diarization, namely speaker diarization, and discuss their relative merits and limitations. Performances using the different techniques are compared within the framework of the speaker diarization task in the DARPA EARS Rich Transcription evaluations. We also look at how the techniques are being introduced into real broadcast news systems and their portability to other domains and tasks such as meetings and speaker verification