Minimizing speaker variation effects for speaker-independent speech recognition

Authors:
Xuedong Huang
Affiliations:
Carnegie Mellon University, Pittsburgh, PA
Venue:
HLT '91 Proceedings of the workshop on Speech and Natural Language
Year:
1992

Citing 6
Cited 2

Learning internal representations by error propagation

Parallel distributed processing: explorations in the microstructure of cognition, vol. 1
A study on speaker-adaptive speech recognition

HLT '91 Proceedings of the workshop on Speech and Natural Language
DARPA resource management benchmark test results June 1990

HLT '90 Proceedings of the workshop on Speech and Natural Language
Improved hidden Markov modeling for speaker-independent continuous speech recognition

HLT '90 Proceedings of the workshop on Speech and Natural Language
Hidden Markov Models for Speech Recognition

Hidden Markov Models for Speech Recognition
The SPHINX-II Speech Recognition System: An Overview

The SPHINX-II Speech Recognition System: An Overview

An overview of the SPHINX-II speech recognition system

HLT '93 Proceedings of the workshop on Human Language Technology
The impact of accents on automatic recognition of South African English speech: a preliminary investigation

SAICSIT '10 Proceedings of the 2010 Annual Research Conference of the South African Institute of Computer Scientists and Information Technologists

Quantified Score

Hi-index	0.02

Visualization

Abstract

For speaker-independent speech recognition, speaker variation is one of the major error sources. In this paper, a speaker-independent normalization network is constructed such that speaker variation effects can be minimized. To achieve this goal, multiple speaker clusters are constructed from the speaker-independent training database. A codeword-dependent neural network is associated with each speaker cluster. The cluster that contains the largest number of speakers is designated as the golden cluster. The objective function is to minimize distortions between acoustic data in each cluster and the golden speaker cluster. Performance evaluation showed that speaker-normalized front-end reduced the error rate by 15% for the DARPA resource management speaker-independent speech recognition task.