A review on speaker diarization systems and approaches

Authors:
M. H. Moattar;M. M. Homayounpour
Affiliations:
Laboratory for Intelligent Multimedia Processing (IMP), Computer Engineering and Information Technology Department, Amirkabir University of Technology, Tehran, Iran;Laboratory for Intelligent Multimedia Processing (IMP), Computer Engineering and Information Technology Department, Amirkabir University of Technology, Tehran, Iran
Venue:
Speech Communication
Year:
2012

Citing 48
Cited 1

Algorithms for clustering data

Algorithms for clustering data
Detection of abrupt changes: theory and application

Detection of abrupt changes: theory and application
Factorial Hidden Markov Models

Machine Learning - Special issue on learning with probabilistic representations
Automatic segmentation of speech recorded in unknown noisy channel characteristics

Speech Communication - Special issue on robust speech recognition
Robust speech recognition using the modulation spectrogram

Speech Communication - Special issue on robust speech recognition
A view of the EM algorithm that justifies incremental, sparse, and other variants

Proceedings of the NATO Advanced Study Institute on Learning in graphical models
DISTBIC: a speaker-based segmentation for audio data indexing

Speech Communication - Special issue on accessing information in spoken audio
Automatic transcription of Broadcast News

Speech Communication - Special issue on automatic transcription of broadcast news data
Speaker change detection and tracking in real-time news broadcasting analysis

Proceedings of the tenth ACM international conference on Multimedia
Broadband Beamforming with Adaptive Postfiltering for Speech Acquisition in Noisy Environments

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97) -Volume 1 - Volume 1
UBM-based incremental speaker adaptation

ICME '03 Proceedings of the 2003 International Conference on Multimedia and Expo - Volume 1
Audio Segmentation and Speaker Localization in Meeting Videos

ICPR '06 Proceedings of the 18th International Conference on Pattern Recognition - Volume 02
Cross-modal prediction in audio-visual communication

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 04
Speaker Diarization For Multiple-Distant-Microphone Meetings Using Several Sources of Information

IEEE Transactions on Computers
On-line multi-modal speaker diarization

Proceedings of the 9th international conference on Multimodal interfaces
Review: Speaker segmentation and clustering

Signal Processing
A Decision-Tree-Based Online Speaker Clustering

IbPRIA '07 Proceedings of the 3rd Iberian conference on Pattern Recognition and Image Analysis, Part I
Progress in the AMIDA Speaker Diarization System for Meeting Data

Multimodal Technologies for Perception of Humans
Multi-stage Speaker Diarization for Conference and Lecture Meetings

Multimodal Technologies for Perception of Humans
Speaker diarization using unsupervised discriminant analysis of inter-channel delay features

ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
Speaker diarization in meeting audio

ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
Fusing short term and long term features for improved speaker diarization

ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
Effective metric-based speaker segmentation in the frequency domain

ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
Cluster criterion functions in spectral subspace and their application in speaker clustering

ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
Fishervoice and semi-supervised speaker clustering

ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
Online speaker clustering using incremental learning of an ergodic hidden Markov model

ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
Improved speaker diarization system for meetings

ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
An overview of text-independent speaker recognition: From features to supervectors

Speech Communication
Audio segmentation for meetings speech processing

Audio segmentation for meetings speech processing
Speaker localisation using audio-visual synchrony: an empirical study

CIVR'03 Proceedings of the 2nd international conference on Image and video retrieval
Annotation of heterogeneous multimedia content using automatic speech recognition

SAMT'07 Proceedings of the semantic and digital media technologies 2nd international conference on Semantic Multimedia
Tuning-robust initialization methods for speaker diarization

IEEE Transactions on Audio, Speech, and Language Processing
The rich transcription 2005 spring meeting recognition evaluation

MLMI'05 Proceedings of the Second international conference on Machine Learning for Multimodal Interaction
Robust speaker segmentation for meetings: the ICSI-SRI spring 2005 diarization system

MLMI'05 Proceedings of the Second international conference on Machine Learning for Multimodal Interaction
NIST RT'05S evaluation: pre-processing techniques and speaker diarization on multiple microphone meetings

MLMI'05 Proceedings of the Second international conference on Machine Learning for Multimodal Interaction
The TNO speaker diarization system for NIST RT05s meeting data

MLMI'05 Proceedings of the Second international conference on Machine Learning for Multimodal Interaction
Speaker diarization for multi-microphone meetings using only between-channel differences

MLMI'06 Proceedings of the Third international conference on Machine Learning for Multimodal Interaction
Technical improvements of the E-HMM based speaker diarization system for meeting records

MLMI'06 Proceedings of the Third international conference on Machine Learning for Multimodal Interaction
Speaker diarization: from broadcast news to lectures

MLMI'06 Proceedings of the Third international conference on Machine Learning for Multimodal Interaction
A robust adaptive beamformer for microphone arrays with a blockingmatrix using constrained adaptive filters

IEEE Transactions on Signal Processing
Prosodic and other Long-Term Features for Speaker Diarization

IEEE Transactions on Audio, Speech, and Language Processing
Computationally Efficient and Robust BIC-Based Speaker Segmentation

IEEE Transactions on Audio, Speech, and Language Processing
Multiple change-point audio segmentation and classification using an MDL-based Gaussian model

IEEE Transactions on Audio, Speech, and Language Processing
Multistage speaker diarization of broadcast news

IEEE Transactions on Audio, Speech, and Language Processing
Progress in the CU-HTK broadcast news transcription system

IEEE Transactions on Audio, Speech, and Language Processing
An overview of automatic speaker diarization systems

IEEE Transactions on Audio, Speech, and Language Processing
Speaker association with signal-level audiovisual fusion

IEEE Transactions on Multimedia
Unified fusion rules for multisensor multihypothesis network decision systems

IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans

Speaker-adaptive speech recognition using speaker diarization for improved transcription of large spoken archives

Speech Communication

Quantified Score

Hi-index	0.00

Visualization

Abstract

Speaker indexing or diarization is an important task in audio processing and retrieval. Speaker diarization is the process of labeling a speech signal with labels corresponding to the identity of speakers. This paper includes a comprehensive review on the evolution of the technology and different approaches in speaker indexing and tries to offer a fully detailed discussion on these approaches and their contributions. This paper reviews the most common features for speaker diarization in addition to the most important approaches for speech activity detection (SAD) in diarization frameworks. Two main tasks of speaker indexing are speaker segmentation and speaker clustering. This paper includes a separate review on the approaches proposed for these subtasks. However, speaker diarization systems which combine the two tasks in a unified framework are also introduced in this paper. Another discussion concerns the approaches for online speaker indexing which has fundamental differences with traditional offline approaches. Other parts of this paper include an introduction on the most common performance measures and evaluation datasets. To conclude this paper, a complete framework for speaker indexing is proposed, which is aimed to be domain independent and parameter free and applicable for both online and offline applications.