Supervised and unsupervised clustering of the speaker space for connectionist speech recognition

Authors:
Yochai Konig;Nelson Morgan
Affiliations:
International Computer Science Institute, Berkeley, CA;International Computer Science Institute, Berkeley, CA
Venue:
ICASSP'93 Proceedings of the 1993 IEEE international conference on Acoustics, speech, and signal processing: plenary, special, audio, underwater acoustics, VLSI, neural networks - Volume I
Year:
1993

Citing 5
Cited 0

Connectionist architectures for multi-speaker phoneme recognition

Advances in neural information processing systems 2
DARPA resource management benchmark test results June 1990

HLT '90 Proceedings of the workshop on Speech and Natural Language
Factoring networks by a statistical method

Neural Computation
Automatic Speech Recognition: The Development of the Sphinx Recognition System

Automatic Speech Recognition: The Development of the Sphinx Recognition System
Modeling Consistency in a Speaker Independent Continuous Speech Recognition System

Advances in Neural Information Processing Systems 5, [NIPS Conference]

Quantified Score

Hi-index	0.00

Visualization

Abstract

One of the challenging problems of a speaker independent - continuous speech recognition system is how to achieve good performance with a new speaker, when the only available source of information about the new speaker is the utterance to be recognized. We propose here a first step towards a solution, based on clustering of the speaker space. Our study had two steps: first we searched for a set of features to cluster speakers. Then, using the chosen features, we investigated two kinds of clustering: supervised - using two clusters: males and females, and unsupervised using two, three, and five clusters. We have integrated the cluster information into our connectionist speech recognition system by using the Speaker Cluster Neural Network(SCNN). The SCNN attempts to share the speaker independent parameters and to model the cluster dependent parameters. Our results show that the best performance is achieved with the supervised clusters, resulting in an overall improvement in recognition performance.