Review: Speaker segmentation and clustering

Authors:
Margarita Kotti;Vassiliki Moschou;Constantine Kotropoulos
Affiliations:
Artificial Intelligence and Information Analysis Lab, Department of Informatics, Aristotle University of Thessaloniki, Box 451, Thessaloniki 54124, Greece;Artificial Intelligence and Information Analysis Lab, Department of Informatics, Aristotle University of Thessaloniki, Box 451, Thessaloniki 54124, Greece;Artificial Intelligence and Information Analysis Lab, Department of Informatics, Aristotle University of Thessaloniki, Box 451, Thessaloniki 54124, Greece
Venue:
Signal Processing
Year:
2008

Citing 21
Cited 9

Algorithms for clustering data

Algorithms for clustering data
Second-order statistical measures for text-independent speaker identification

Speech Communication
Automatic segmentation of speech recorded in unknown noisy channel characteristics

Speech Communication - Special issue on robust speech recognition
DISTBIC: a speaker-based segmentation for audio data indexing

Speech Communication - Special issue on accessing information in spoken audio
Spoken Language Processing: A Guide to Theory, Algorithm, and System Development

Spoken Language Processing: A Guide to Theory, Algorithm, and System Development
Discrete Time Processing of Speech Signals

Discrete Time Processing of Speech Signals
Speaker change detection and tracking in real-time news broadcasting analysis

Proceedings of the tenth ACM international conference on Multimedia
Speech/music segmentation using entropy and dynamism features in a HMM classification framework

Speech Communication
Introduction to MPEG-7: Multimedia Content Description Interface

Introduction to MPEG-7: Multimedia Content Description Interface
MPEG-7 Audio and Beyond: Audio Content Indexing and Retrieval

MPEG-7 Audio and Beyond: Audio Content Indexing and Retrieval
Segregation of speakers for speech recognition and speaker identification

ICASSP '91 Proceedings of the Acoustics, Speech, and Signal Processing, 1991. ICASSP-91., 1991 International Conference
Speech segmentation without speech recognition

ICME '03 Proceedings of the 2003 International Conference on Multimedia and Expo - Volume 2
Phonemic segmentation using the generalised Gamma distribution and small sample Bayesian information criterion

Speech Communication
Computationally Efficient and Robust BIC-Based Speaker Segmentation

IEEE Transactions on Audio, Speech, and Language Processing
Automatic segmentation and identification of mixed-language speech using delta-BIC and LSA-based GMMs

IEEE Transactions on Audio, Speech, and Language Processing
Multiple change-point audio segmentation and classification using an MDL-based Gaussian model

IEEE Transactions on Audio, Speech, and Language Processing
Multistage speaker diarization of broadcast news

IEEE Transactions on Audio, Speech, and Language Processing
Progress in the CU-HTK broadcast news transcription system

IEEE Transactions on Audio, Speech, and Language Processing
An overview of automatic speaker diarization systems

IEEE Transactions on Audio, Speech, and Language Processing
Unified fusion rules for multisensor multihypothesis network decision systems

IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans
Unsupervised speaker recognition based on competition between self-organizing maps

IEEE Transactions on Neural Networks

Speaker diarization using autoassociative neural networks

Engineering Applications of Artificial Intelligence
Unfolding speaker clustering potential: a biomimetic approach

MM '09 Proceedings of the 17th ACM international conference on Multimedia
Speaker diarization exploiting the eigengap criterion and cluster ensembles

IEEE Transactions on Audio, Speech, and Language Processing
Variational conditional random fields for online speaker detection and tracking

Speech Communication
A review on speaker diarization systems and approaches

Speech Communication
Hierarchical ANN system for stuttering identification

Computer Speech and Language
Towards information-theoretic K-means clustering for image indexing

Signal Processing
A Study of the Cosine Distance-Based Mean Shift for Telephone Speech Diarization

IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP)
A unified framework for domain independent online speaker indexing in eigen-voice space using an index tree of reference models

International Journal of Speech Technology

Quantified Score

Hi-index	0.08

Visualization

Abstract

This survey focuses on two challenging speech processing topics, namely: speaker segmentation and speaker clustering. Speaker segmentation aims at finding speaker change points in an audio stream, whereas speaker clustering aims at grouping speech segments based on speaker characteristics. Model-based, metric-based, and hybrid speaker segmentation algorithms are reviewed. Concerning speaker clustering, deterministic and probabilistic algorithms are examined. A comparative assessment of the reviewed algorithms is undertaken, the algorithm advantages and disadvantages are indicated, insight to the algorithms is offered, and deductions as well as recommendations are given. Rich transcription and movie analysis are candidate applications that benefit from combined speaker segmentation and clustering.