Algorithms for clustering data
Algorithms for clustering data
Second-order statistical measures for text-independent speaker identification
Speech Communication
Automatic segmentation of speech recorded in unknown noisy channel characteristics
Speech Communication - Special issue on robust speech recognition
DISTBIC: a speaker-based segmentation for audio data indexing
Speech Communication - Special issue on accessing information in spoken audio
Spoken Language Processing: A Guide to Theory, Algorithm, and System Development
Spoken Language Processing: A Guide to Theory, Algorithm, and System Development
Discrete Time Processing of Speech Signals
Discrete Time Processing of Speech Signals
Speaker change detection and tracking in real-time news broadcasting analysis
Proceedings of the tenth ACM international conference on Multimedia
Introduction to MPEG-7: Multimedia Content Description Interface
Introduction to MPEG-7: Multimedia Content Description Interface
MPEG-7 Audio and Beyond: Audio Content Indexing and Retrieval
MPEG-7 Audio and Beyond: Audio Content Indexing and Retrieval
Segregation of speakers for speech recognition and speaker identification
ICASSP '91 Proceedings of the Acoustics, Speech, and Signal Processing, 1991. ICASSP-91., 1991 International Conference
Speech segmentation without speech recognition
ICME '03 Proceedings of the 2003 International Conference on Multimedia and Expo - Volume 2
Computationally Efficient and Robust BIC-Based Speaker Segmentation
IEEE Transactions on Audio, Speech, and Language Processing
IEEE Transactions on Audio, Speech, and Language Processing
Multiple change-point audio segmentation and classification using an MDL-based Gaussian model
IEEE Transactions on Audio, Speech, and Language Processing
Multistage speaker diarization of broadcast news
IEEE Transactions on Audio, Speech, and Language Processing
Progress in the CU-HTK broadcast news transcription system
IEEE Transactions on Audio, Speech, and Language Processing
An overview of automatic speaker diarization systems
IEEE Transactions on Audio, Speech, and Language Processing
Unified fusion rules for multisensor multihypothesis network decision systems
IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans
Unsupervised speaker recognition based on competition between self-organizing maps
IEEE Transactions on Neural Networks
Speaker diarization using autoassociative neural networks
Engineering Applications of Artificial Intelligence
Unfolding speaker clustering potential: a biomimetic approach
MM '09 Proceedings of the 17th ACM international conference on Multimedia
Speaker diarization exploiting the eigengap criterion and cluster ensembles
IEEE Transactions on Audio, Speech, and Language Processing
Variational conditional random fields for online speaker detection and tracking
Speech Communication
A review on speaker diarization systems and approaches
Speech Communication
Hierarchical ANN system for stuttering identification
Computer Speech and Language
Towards information-theoretic K-means clustering for image indexing
Signal Processing
A Study of the Cosine Distance-Based Mean Shift for Telephone Speech Diarization
IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP)
International Journal of Speech Technology
Hi-index | 0.08 |
This survey focuses on two challenging speech processing topics, namely: speaker segmentation and speaker clustering. Speaker segmentation aims at finding speaker change points in an audio stream, whereas speaker clustering aims at grouping speech segments based on speaker characteristics. Model-based, metric-based, and hybrid speaker segmentation algorithms are reviewed. Concerning speaker clustering, deterministic and probabilistic algorithms are examined. A comparative assessment of the reviewed algorithms is undertaken, the algorithm advantages and disadvantages are indicated, insight to the algorithms is offered, and deductions as well as recommendations are given. Rich transcription and movie analysis are candidate applications that benefit from combined speaker segmentation and clustering.