Recognizing the message and the messenger: biomimetic spectral analysis for robust speech and speaker recognition

Authors:
Sridhar Krishna Nemala;Kailash Patil;Mounya Elhilali
Affiliations:
Department of Electrical and Computer Engineering, Center for Language and Speech Processing, Johns Hopkins University, Baltimore, USA;Department of Electrical and Computer Engineering, Center for Language and Speech Processing, Johns Hopkins University, Baltimore, USA;Department of Electrical and Computer Engineering, Center for Language and Speech Processing, Johns Hopkins University, Baltimore, USA
Venue:
International Journal of Speech Technology
Year:
2013

Citing 16
Cited 0

Should recognizers have ears?

Speech Communication - Special issue on robust speech recognition
Connectionist Speech Recognition: A Hybrid Approach

Connectionist Speech Recognition: A Hybrid Approach
Speech Processing in the Auditory System

Speech Processing in the Auditory System
The auditory processing and recognition of speech

HLT '89 Proceedings of the workshop on Speech and Natural Language
Comparison of auditory models for robust speech recognition

HLT '91 Proceedings of the workshop on Speech and Natural Language
Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing)

Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing)
Review:

Neural Computation
Robust speech feature extraction based on Gabor filtering and tensor factorization

ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
Speech fragment decoding techniques for simultaneous speaker identification and speech recognition

Computer Speech and Language
An overview of text-independent speaker recognition: From features to supervectors

Speech Communication
Analysis of MLP-Based Hierarchical Phoneme Posterior Probability Estimator

IEEE Transactions on Audio, Speech, and Language Processing
MVA Processing of Speech Features

IEEE Transactions on Audio, Speech, and Language Processing
Data Balancing for Efficient Training of Hybrid ANN/HMM Automatic Speech Recognition Systems

IEEE Transactions on Audio, Speech, and Language Processing
Speech Analysis in a Model of the Central Auditory System

IEEE Transactions on Audio, Speech, and Language Processing
Auditory representations of acoustic signals

IEEE Transactions on Information Theory - Part 2
Robust combination of neural networks and hidden Markov models for speech recognition

IEEE Transactions on Neural Networks

Quantified Score

Hi-index	0.00

Visualization

Abstract

Humans are quite adept at communicating in presence of noise. However most speech processing systems, like automatic speech and speaker recognition systems, suffer from a significant drop in performance when speech signals are corrupted with unseen background distortions. The proposed work explores the use of a biologically-motivated multi-resolution spectral analysis for speech representation. This approach focuses on the information-rich spectral attributes of speech and presents an intricate yet computationally-efficient analysis of the speech signal by careful choice of model parameters. Further, the approach takes advantage of an information-theoretic analysis of the message and speaker dominant regions in the speech signal, and defines feature representations to address two diverse tasks such as speech and speaker recognition. The proposed analysis surpasses the standard Mel-Frequency Cepstral Coefficients (MFCC), and its enhanced variants (via mean subtraction, variance normalization and time sequence filtering) and yields significant improvements over a state-of-the-art noise robust feature scheme, on both speech and speaker recognition tasks.