Fundamentals of speech recognition
Fundamentals of speech recognition
Speaker identification and verification using Gaussian mixture speaker models
Speech Communication
Pattern Recognition and Machine Learning (Information Science and Statistics)
Pattern Recognition and Machine Learning (Information Science and Statistics)
Generative model-based speaker clustering via mixture of von Mises-Fisher distributions
ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
Multistage speaker diarization of broadcast news
IEEE Transactions on Audio, Speech, and Language Processing
An overview of automatic speaker diarization systems
IEEE Transactions on Audio, Speech, and Language Processing
Hi-index | 0.00 |
In this paper, we propose an efficient speaker clustering approach based on a locality preserving linear projective mapping in the Gaussian mixture model (GMM) mean supervector space. While the GMM mean supervector has turned out to be an effective representation of speakers, its dimensionality is usually very high. The locality preserving projection (LPP) maps the high-dimensional GMM mean supervector space into a lower-dimensional subspace in an unsupervised fashion where the local neighborhood structure of the data points is optimally preserved. Our speaker clustering experiments clearly show that in the reduced-dimensional LPP subspace, traditional clustering techniques such as k-means and hierarchical clustering perform significantly better than they would in the original high-dimensional GMM mean supervector space and in its principal component subspace.