Auditory sparse representation for robust speaker recognition based on tensor structure

Authors:
Qiang Wu;Liqing Zhang
Affiliations:
Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai, China;Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai, China
Venue:
EURASIP Journal on Audio, Speech, and Music Processing - Intelligent Audio, Speech, and Music Processing Applications
Year:
2008

Citing 10
Cited 0

Fundamentals of speech recognition

Fundamentals of speech recognition
A Multilinear Singular Value Decomposition

SIAM Journal on Matrix Analysis and Applications
Multilinear Independent Components Analysis

CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 1 - Volume 01
Efficient Coding of Time-Relative Structure Using Spikes

Neural Computation
Sparse representations of polyphonic music

Signal Processing - Sparse approximations in signal and image processing
Sparse spectrotemporal coding of sounds

EURASIP Journal on Applied Signal Processing
A review of signal subspace speech enhancement and its application to noise robust speech recognition

EURASIP Journal on Applied Signal Processing
General Tensor Discriminant Analysis and Gabor Features for Gait Recognition

IEEE Transactions on Pattern Analysis and Machine Intelligence
Learning self-organized topology-preserving complex speech features at primary auditory cortex

Neurocomputing
A two-stage algorithm for one-microphone reverberant speech enhancement

IEEE Transactions on Audio, Speech, and Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper investigates the problem of speaker recognition in noisy conditions. A new approach called nonnegative tensor principal component analysis (NTPCA) with sparse constraint is proposed for speech feature extraction. We encode speech as a general higher-order tensor in order to extract discriminative features in spectrotemporal domain. Firstly, speech signals are represented by cochlear feature based on frequency selectivity characteristics at basilar membrane and inner hair cells; then, low-dimension sparse features are extracted by NTPCA for robust speaker modeling. The useful information of each subspace in the higher-order tensor can be preserved. Alternating projection algorithm is used to obtain a stable solution. Experimental results demonstrate that our method can increase the recognition accuracy specifically in noisy environments.