A discriminative training approach for text-independent speaker recognition

Authors:
Q. Y. Hong;S. Kwong
Affiliations:
Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong and Department of Computer Science, Xiamen University, Xiamen, PR China;Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong
Venue:
Signal Processing
Year:
2005

Citing 5
Cited 3

Fundamentals of speech recognition

Fundamentals of speech recognition
Speaker identification and verification using Gaussian mixture speaker models

Speech Communication
Text-independent speaker recognition using non-linear frame likelihood transformation

Speech Communication
The NIST speaker recognition evaluation - overview methodology, systems, results, perspective

Speech Communication - Speaker recognition and its commercial and forensic applications
Discriminative training of GMM for speaker identification

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 01

Automated speech analysis applied to laryngeal disease categorization

Computer Methods and Programs in Biomedicine
Transformation-based GMM with improved cluster algorithm for speaker identification

PAKDD'07 Proceedings of the 11th Pacific-Asia conference on Advances in knowledge discovery and data mining
Audio based solutions for detecting intruders in wild areas

Signal Processing

Quantified Score

Hi-index	0.08

Visualization

Abstract

Gaussian mixture model (GMM) has been commonly used for text-independent speaker recognition. The estimation of model parameters is generally performed based on the maximum likelihood (ML) criterion. However, this criterion only utilizes the labeled utterances for each speaker model and very likely leads to a local optimization solution. To solve this problem, this paper proposes a discriminative training approach based on the maximum model distance (MMD) criterion. We investigate the characteristics of speaker recognition and further propose a novel selection strategy of competing speakers associated with it. Experimental results based on the KING and TIMIT databases demonstrate that our training approach was quite efficient to improve the performance of speaker identification and verification. When there were three training sentences for each speaker, the verification equal error rate (EER) of 168 speakers in TIMIT could be reduced by 30.4% compared with the conventional method.