Fundamentals of speech recognition
Fundamentals of speech recognition
Speaker identification and verification using Gaussian mixture speaker models
Speech Communication
Second-order statistical measures for text-independent speaker identification
Speech Communication
A view of the EM algorithm that justifies incremental, sparse, and other variants
Learning in graphical models
DISTBIC: a speaker-based segmentation for audio data indexing
Speech Communication - Special issue on accessing information in spoken audio
Speaker change detection and tracking in real-time news broadcasting analysis
Proceedings of the tenth ACM international conference on Multimedia
Robust content-based image searches for copyright protection
MMDB '03 Proceedings of the 1st ACM international workshop on Multimedia databases
Similarity Search: The Metric Space Approach (Advances in Database Systems)
Similarity Search: The Metric Space Approach (Advances in Database Systems)
Speech segmentation without speech recognition
ICME '03 Proceedings of the 2003 International Conference on Multimedia and Expo - Volume 2
Supervised classification using MCMC methods
ICASSP '00 Proceedings of the Acoustics, Speech, and Signal Processing, 2000. on IEEE International Conference - Volume 01
Review: Speaker segmentation and clustering
Signal Processing
A Decision-Tree-Based Online Speaker Clustering
IbPRIA '07 Proceedings of the 3rd Iberian conference on Pattern Recognition and Image Analysis, Part I
A Simple But Effective Approach to Speaker Tracking in Broadcast News
IbPRIA '07 Proceedings of the 3rd Iberian conference on Pattern Recognition and Image Analysis, Part II
Speaker diarization using unsupervised discriminant analysis of inter-channel delay features
ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
Effective metric-based speaker segmentation in the frequency domain
ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
Cluster criterion functions in spectral subspace and their application in speaker clustering
ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
Fishervoice and semi-supervised speaker clustering
ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
Online speaker clustering using incremental learning of an ergodic hidden Markov model
ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
Generative model-based speaker clustering via mixture of von Mises-Fisher distributions
ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
Inferring parameters and structure of latent variable models by variational bayes
UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
IEEE Transactions on Audio, Speech, and Language Processing
Multistage speaker diarization of broadcast news
IEEE Transactions on Audio, Speech, and Language Processing
Hi-index | 0.00 |
Speaker indexing referred in literature as speaker diarization is an important task in audio indexing and retrieval. Speaker indexing includes two important and usually separate stages, namely speaker segmentation and speaker clustering. Speaker indexing can be divided into online and offline categories. This paper mainly focuses on domain independent online speaker indexing. For this purpose, the proposed framework should be parameter free and no application specific parameters such as utterance duration or threshold settings are required. To reduce dependency on parameters, the traditional speaker segmentation is reformed to a voting based homogeneous speech segmentation, in which several approaches are applied in parallel to decide on the existence of a change point. In online indexing, data insufficiency is encountered at each time slice. In the proposed framework, a set of reference speaker models are used as side information to facilitate online tracking. To improve the indexing accuracy, adaptation approaches in eigen-voice decomposition space are proposed in this paper. To enhance the tracking performance from the computational cost point of view, an index structure of the reference models is proposed to speed up the search in the model space. The proposed framework is evaluated on the 2002 Rich Transcription Broadcast News and Conversational Telephone Speech Corpus (in Garofolo, NIST Rich Transcription, 2002) as well as a synthetic dataset. The indexing error of the proposed framework on telephone conversations, broadcast news and synthetic dataset are 7.51 %, 6.36 % and 9.34 %, respectively. Also, using the index tree structure approach, the tracking run time of the proposed framework is improved by 32 %.