Design, analysis and experimental evaluation of block based transformation in MFCC computation for speaker recognition

Authors:
Md. Sahidullah;Goutam Saha
Affiliations:
Department of Electronics and Electrical Communication Engineering, Indian Institute of Technology, Kharagpur, Kharagpur 721 302, India;Department of Electronics and Electrical Communication Engineering, Indian Institute of Technology, Kharagpur, Kharagpur 721 302, India
Venue:
Speech Communication
Year:
2012

Citing 16
Cited 1

Subband architecture for automatic speaker recognition

Signal Processing - Special issue on emerging techniques for communication terminals
Multiresolution Signal Decomposition: Transforms, Subbands, and Wavelets

Multiresolution Signal Decomposition: Transforms, Subbands, and Wavelets
Improving speaker identification in noise by subband processing and decision fusion

Pattern Recognition Letters - Special issue: Audio- and video-based biometric person authentication (AVBPA 2001)
Phoneme recognition using ICA-based feature extraction and transformation

Signal Processing
SNR-dependent compression of enhanced Mel sub-band energies for compensation of noise effects on MFCC features

Pattern Recognition Letters
Springer Handbook of Speech Processing

Springer Handbook of Speech Processing
Discrete Cosine Transfom

IEEE Transactions on Computers
Investigation on LP-residual representations for speaker identification

Pattern Recognition
Discrete-time speech signal processing: principles and practice

Discrete-time speech signal processing: principles and practice
An overview of text-independent speaker recognition: From features to supervectors

Speech Communication
Feature selection using singular value decomposition and QR factorization with column pivoting for text-independent speaker identification

Speech Communication
Channel robust feature transformation based on filter-bank energy filtering

IEEE Transactions on Audio, Speech, and Language Processing
On the use of perceptual Line Spectral pairs Frequencies and higher-order residual moments for Speaker Identification

International Journal of Biometrics
Robust Speaker Recognition in Noisy Conditions

IEEE Transactions on Audio, Speech, and Language Processing
Subband DCT: definition, analysis, and applications

IEEE Transactions on Circuits and Systems for Video Technology
Image resizing in the compressed domain using subband DCT

IEEE Transactions on Circuits and Systems for Video Technology

A novel speech content authentication algorithm based on Bessel-Fourier moments

Digital Signal Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Standard Mel frequency cepstrum coefficient (MFCC) computation technique utilizes discrete cosine transform (DCT) for decorrelating log energies of filter bank output. The use of DCT is reasonable here as the covariance matrix of Mel filter bank log energy (MFLE) can be compared with that of highly correlated Markov-I process. This full-band based MFCC computation technique where each of the filter bank output has contribution to all coefficients, has two main disadvantages. First, the covariance matrix of the log energies does not exactly follow Markov-I property. Second, full-band based MFCC feature gets severely degraded when speech signal is corrupted with narrow-band channel noise, though few filter bank outputs may remain unaffected. In this work, we have studied a class of linear transformation techniques based on block wise transformation of MFLE which effectively decorrelate the filter bank log energies and also capture speech information in an efficient manner. A thorough study has been carried out on the block based transformation approach by investigating a new partitioning technique that highlights associated advantages. This article also reports a novel feature extraction scheme which captures complementary information to wide band information; that otherwise remains undetected by standard MFCC and proposed block transform (BT) techniques. The proposed features are evaluated on NIST SRE databases using Gaussian mixture model-universal background model (GMM-UBM) based speaker recognition system. We have obtained significant performance improvement over baseline features for both matched and mismatched condition, also for standard and narrow-band noises. The proposed method achieves significant performance improvement in presence of narrow-band noise when clubbed with missing feature theory based score computation scheme.