Fishervoice and semi-supervised speaker clustering

  • Authors:
  • Stephen M. Chu;Hao Tang;Thomas S. Huang

  • Affiliations:
  • IBM T. J. Watson Research Center, Yorktown Heights, N.Y. 10598, USA;Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign, 61801, USA;Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign, 61801, USA

  • Venue:
  • ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Speaker subspace modeling has become increasingly important in speaker recognition, diarization, and clustering. Principal component analysis (PCA) is a popular linear subspace learning technique and the approach that represents an arbitrary utterance or speaker as a linear combination of a set of basis voices based on PCA is known as the eigenvoice approach. In this paper, a novel technique, namely the fishervoice approach, is proposed. The fishervoice approach is based on linear discriminant analysis, another successful linear subspace learning technique that provides an optimized low-dimensional representation of utterances or speakers with focus on the most discriminative basis voices. We apply the fishervoice approach to speaker clustering in a semi-supervised manner and show that the fishervoice approach significantly outperforms the eigenvoice approach in all our experiments on the GALE Mandarin dataset.