Subband architecture for automatic speaker recognition
Signal Processing - Special issue on emerging techniques for communication terminals
Multiresolution Signal Decomposition: Transforms, Subbands, and Wavelets
Multiresolution Signal Decomposition: Transforms, Subbands, and Wavelets
Improving speaker identification in noise by subband processing and decision fusion
Pattern Recognition Letters - Special issue: Audio- and video-based biometric person authentication (AVBPA 2001)
Pattern Recognition Letters
Springer Handbook of Speech Processing
Springer Handbook of Speech Processing
IEEE Transactions on Computers
Investigation on LP-residual representations for speaker identification
Pattern Recognition
Discrete-time speech signal processing: principles and practice
Discrete-time speech signal processing: principles and practice
An overview of text-independent speaker recognition: From features to supervectors
Speech Communication
Channel robust feature transformation based on filter-bank energy filtering
IEEE Transactions on Audio, Speech, and Language Processing
International Journal of Biometrics
Robust Speaker Recognition in Noisy Conditions
IEEE Transactions on Audio, Speech, and Language Processing
Subband DCT: definition, analysis, and applications
IEEE Transactions on Circuits and Systems for Video Technology
Image resizing in the compressed domain using subband DCT
IEEE Transactions on Circuits and Systems for Video Technology
A novel speech content authentication algorithm based on Bessel-Fourier moments
Digital Signal Processing
Hi-index | 0.00 |
Standard Mel frequency cepstrum coefficient (MFCC) computation technique utilizes discrete cosine transform (DCT) for decorrelating log energies of filter bank output. The use of DCT is reasonable here as the covariance matrix of Mel filter bank log energy (MFLE) can be compared with that of highly correlated Markov-I process. This full-band based MFCC computation technique where each of the filter bank output has contribution to all coefficients, has two main disadvantages. First, the covariance matrix of the log energies does not exactly follow Markov-I property. Second, full-band based MFCC feature gets severely degraded when speech signal is corrupted with narrow-band channel noise, though few filter bank outputs may remain unaffected. In this work, we have studied a class of linear transformation techniques based on block wise transformation of MFLE which effectively decorrelate the filter bank log energies and also capture speech information in an efficient manner. A thorough study has been carried out on the block based transformation approach by investigating a new partitioning technique that highlights associated advantages. This article also reports a novel feature extraction scheme which captures complementary information to wide band information; that otherwise remains undetected by standard MFCC and proposed block transform (BT) techniques. The proposed features are evaluated on NIST SRE databases using Gaussian mixture model-universal background model (GMM-UBM) based speaker recognition system. We have obtained significant performance improvement over baseline features for both matched and mismatched condition, also for standard and narrow-band noises. The proposed method achieves significant performance improvement in presence of narrow-band noise when clubbed with missing feature theory based score computation scheme.