Speaker recognition using G.729 speech codec parameters

Authors:
T. F. Quatieri;R. B. Dunn;D. A. Reynolds;J. P. Campbell;E. Singer
Affiliations:
Lincoln Lab., MIT, Lexington, MA, USA;-;-;-;-
Venue:
ICASSP '00 Proceedings of the Acoustics, Speech, and Signal Processing, 2000. on IEEE International Conference - Volume 02
Year:
2000

Citing 0
Cited 3

Sun-Yuan Kung, Speaker Verification from Coded Telephone Speech Using Stochastic Feature Transformation and Handset Identification

PCM '02 Proceedings of the Third IEEE Pacific Rim Conference on Multimedia: Advances in Multimedia Information Processing
Common vector approach and its combination with GMM for text-independent speaker recognition

Expert Systems with Applications: An International Journal
Robust regression fusion of GMM-UBM and GMM-SVM normalized scores using G729 bit-stream for speaker recognition over IP

International Journal of Speech Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

Experiments in Gaussian-mixture-model speaker recognition from mel-cepstra, derived from mel-filter bank energies (MFBs) of the G.729 codec all-pole spectral envelope, showed significant performance loss relative to the standard mel-cepstral coefficients of G.729 synthesized (coded) speech (Quatieri et al. 1999). In this paper, we investigate two approaches to recover speaker recognition performance from G.729 parameters. The first is a parametric approach that makes explicit use of G.729 parameters, rather than deriving cepstra from MFBs of an all-pole spectrum. Specifically, the G.729 LSFs are converted to "direct" cepstral coefficients for which there exists a one-to-one correspondence with the LSFs. The G.729 residual is also considered; in particular, appending G.729 pitch as a single parameter to the direct cepstral coefficients gives further performance gain. The second nonparametric approach uses the original MFB paradigm, but adds harmonic striations to the G.729 all-pole spectral envelope. Although obtaining considerable performance gains with these methods, we have yet to match the performance of G.729 synthesized speech, motivating the need for representing additional fine structure of the G.729 residual.