Unfolding speaker clustering potential: a biomimetic approach

  • Authors:
  • Thilo Stadelmann;Bernd Freisleben

  • Affiliations:
  • University of Marburg, Marburg, Germany;University of Marburg, Marburg, Germany

  • Venue:
  • MM '09 Proceedings of the 17th ACM international conference on Multimedia
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Speaker clustering is the task of grouping a set of speech utterances into speaker-specific classes. The basic techniques for solving this task are similar to those used for speaker verification and identification. The hypothesis of this paper is that the techniques originally developed for speaker verification and identification are not sufficiently discriminative for speaker clustering. However, the processing chain for speaker clustering is quite large - there are many potential areas for improvement. The question is: where should improvements be made to improve the final result? To answer this question, this paper takes a biomimetic approach based on a study with human participants acting as an automatic speaker clustering system. Our findings are twofold: it is the stage of modeling that has the highest potential, and information with respect to the temporal succession of frames is crucially missing. Experimental results with our implementation of a speaker clustering system incorporating our findings and applying it on TIMIT data show the validity of our approach.