Fishervoice and semi-supervised speaker clustering

Authors:
Stephen M. Chu;Hao Tang;Thomas S. Huang
Affiliations:
IBM T. J. Watson Research Center, Yorktown Heights, N.Y. 10598, USA;Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign, 61801, USA;Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign, 61801, USA
Venue:
ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
Year:
2009

Citing 0
Cited 4

A review on speaker diarization systems and approaches

Speech Communication
Singing speaker clustering based on subspace learning in the GMM mean supervector space

Speech Communication
Pattern classification and clustering: A review of partially supervised learning approaches

Pattern Recognition Letters
A unified framework for domain independent online speaker indexing in eigen-voice space using an index tree of reference models

International Journal of Speech Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

Speaker subspace modeling has become increasingly important in speaker recognition, diarization, and clustering. Principal component analysis (PCA) is a popular linear subspace learning technique and the approach that represents an arbitrary utterance or speaker as a linear combination of a set of basis voices based on PCA is known as the eigenvoice approach. In this paper, a novel technique, namely the fishervoice approach, is proposed. The fishervoice approach is based on linear discriminant analysis, another successful linear subspace learning technique that provides an optimized low-dimensional representation of utterances or speakers with focus on the most discriminative basis voices. We apply the fishervoice approach to speaker clustering in a semi-supervised manner and show that the fishervoice approach significantly outperforms the eigenvoice approach in all our experiments on the GALE Mandarin dataset.