Supervised and unsupervised clustering of the speaker space for connectionist speech recognition

  • Authors:
  • Yochai Konig;Nelson Morgan

  • Affiliations:
  • International Computer Science Institute, Berkeley, CA;International Computer Science Institute, Berkeley, CA

  • Venue:
  • ICASSP'93 Proceedings of the 1993 IEEE international conference on Acoustics, speech, and signal processing: plenary, special, audio, underwater acoustics, VLSI, neural networks - Volume I
  • Year:
  • 1993

Quantified Score

Hi-index 0.00

Visualization

Abstract

One of the challenging problems of a speaker independent - continuous speech recognition system is how to achieve good performance with a new speaker, when the only available source of information about the new speaker is the utterance to be recognized. We propose here a first step towards a solution, based on clustering of the speaker space. Our study had two steps: first we searched for a set of features to cluster speakers. Then, using the chosen features, we investigated two kinds of clustering: supervised - using two clusters: males and females, and unsupervised using two, three, and five clusters. We have integrated the cluster information into our connectionist speech recognition system by using the Speaker Cluster Neural Network(SCNN). The SCNN attempts to share the speaker independent parameters and to model the cluster dependent parameters. Our results show that the best performance is achieved with the supervised clusters, resulting in an overall improvement in recognition performance.