Investigation on LP-residual representations for speaker identification

Authors:
M. Chetouani;M. Faundez-Zanuy;B. Gas;J. L. Zarader
Affiliations:
Université Pierre et Marie Curie (UPMC), 4 Place Jussieu, 75252 Paris Cedex 05, France;Escola Universitíria Politècnica de Mataró, Barcelona, Spain;Université Pierre et Marie Curie (UPMC), 4 Place Jussieu, 75252 Paris Cedex 05, France;Université Pierre et Marie Curie (UPMC), 4 Place Jussieu, 75252 Paris Cedex 05, France
Venue:
Pattern Recognition
Year:
2009

Citing 12
Cited 4

Speaker identification and verification using Gaussian mixture speaker models

Speech Communication
Usefulness of the LPC-residue in text-independent speaker verification

Speech Communication
Second-order statistical measures for text-independent speaker identification

Speech Communication
On Combining Classifiers

IEEE Transactions on Pattern Analysis and Machine Intelligence
Subband architecture for automatic speaker recognition

Signal Processing - Special issue on emerging techniques for communication terminals
Speaker Identification Using Harmonic Structure of LP-residual Spectrum

AVBPA '97 Proceedings of the First International Conference on Audio- and Video-Based Biometric Person Authentication
Nonlinear Synthesis of Vowels in the LP Residual Domain with a Regularized RBF Network

IWANN '01 Proceedings of the 6th International Work-Conference on Artificial and Natural Neural Networks: Bio-inspired Applications of Connectionism-Part II
Recognition of noisy speech using cumulant-based linear prediction analysis

ICASSP '91 Proceedings of the Acoustics, Speech, and Signal Processing, 1991. ICASSP-91., 1991 International Conference
On the use of residual cepstrum in speech recognition

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 01
Nonlinear Speech Modeling and Applications: advanced Lectures and Revised Selected Papers

Nonlinear Speech Modeling and Applications: advanced Lectures and Revised Selected Papers
Some notes on nonlinearities of speech

Nonlinear Speech Modeling and Applications
Non-linear speech feature extraction for phoneme classification and speaker recognition

Nonlinear Speech Modeling and Applications

An overview of text-independent speaker recognition: From features to supervectors

Speech Communication
On the use of perceptual Line Spectral pairs Frequencies and higher-order residual moments for Speaker Identification

International Journal of Biometrics
Comparison of the impact of some Minkowski metrics on VQ/GMM based speaker recognition

Computers and Electrical Engineering
Design, analysis and experimental evaluation of block based transformation in MFCC computation for speaker recognition

Speech Communication

Quantified Score

Hi-index	0.01

Visualization

Abstract

Feature extraction is an essential and important step for speaker recognition systems. In this paper, we propose to improve these systems by exploiting both conventional features such as mel frequency cepstral coding (MFCC), linear predictive cepstral coding (LPCC) and non-conventional ones. The method exploits information present in the linear predictive (LP) residual signal. The features extracted from the LP-residue are then combined to the MFCC or the LPCC. We investigate two approaches termed as temporal and frequential representations. The first one consists of an auto-regressive (AR) modelling of the signal followed by a cepstral transformation in a similar way to the LPC-LPCC transformation. In order to take into account the non-linear nature of the speech signals we used two estimation methods based on second and third-order statistics. They are, respectively, termed as R-SOS-LPCC (residual plus second-order statistic based estimation of the AR model plus cepstral transformation) and R-HOS-LPCC (higher order). Concerning the frequential approach, we exploit a filter bank method called the power difference of spectra in sub-band (PDSS) which measures the spectral flatness over the sub-bands. The resulting features are named R-PDSS. The analysis of these proposed schemes are done over a speaker identification problem with two different databases. The first one is the Gaudi database and contains 49 speakers. The main interest lies in the controlled acquisition conditions: mismatch between the microphones and the interval sessions. The second database is the well-known NTIMIT corpus with 630 speakers. The performances of the features are confirmed over this larger corpus. In addition, we propose to compare traditional features and residual ones by the fusion of recognizers (feature extractor + classifier). The results show that residual features carry speaker-dependent features and the combination with the LPCC or the MFCC shows global improvements in terms of robustness under different mismatches. A comparison between the residual features under the opinion fusion framework gives us useful information about the potential of both temporal and frequential representations.