Noise-robust speech recognition through auditory feature detection and spike sequence decoding

Authors:
Phillip B. Schafer;Dezhe Z. Jin
Affiliations:
-;-
Venue:
Neural Computation
Year:
2014

Citing 23
Cited 0

Fundamentals of speech recognition

Fundamentals of speech recognition
Towards increasing speech recognition error rates

Speech Communication
Statistical methods for speech recognition

Statistical methods for speech recognition
Independent component analysis: algorithms and applications

Neural Networks
Robust automatic speech recognition with missing and unreliable acoustic data

Speech Communication
Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition

Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition
Connectionist Speech Recognition: A Hybrid Approach

Connectionist Speech Recognition: A Hybrid Approach
A Survey of Longest Common Subsequence Algorithms

SPIRE '00 Proceedings of the Seventh International Symposium on String Processing Information Retrieval (SPIRE'00)
Challenges in adopting speech recognition

Communications of the ACM - Multimodal interfaces that flex, adapt, and persist
Phoneme recognition using ICA-based feature extraction and transformation

Signal Processing
Efficient Coding of Time-Relative Structure Using Spikes

Neural Computation
Reaching over the gap: A review of efforts to link human and automatic speech recognition research

Speech Communication
Sparse spectrotemporal coding of sounds

EURASIP Journal on Applied Signal Processing
Continuous speech recognition with sparse coding

Computer Speech and Language
Isolated word recognition with the Liquid State Machine: a case study

Information Processing Letters - Special issue on applications of spiking neural networks
Auditory cortical representations of speech signals for phoneme classification

MICAI'07 Proceedings of the artificial intelligence 6th Mexican international conference on Advances in artificial intelligence
Noise adaptive training for robust automatic speech recognition

IEEE Transactions on Audio, Speech, and Language Processing
MVA Processing of Speech Features

IEEE Transactions on Audio, Speech, and Language Processing
Discrimination of speech from nonspeech based on multiscale spectro-temporal Modulations

IEEE Transactions on Audio, Speech, and Language Processing
Template-Based Continuous Speech Recognition

IEEE Transactions on Audio, Speech, and Language Processing
Acoustic Modeling Using Deep Belief Networks

IEEE Transactions on Audio, Speech, and Language Processing
Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition

IEEE Transactions on Audio, Speech, and Language Processing
The PASCAL CHiME speech separation and recognition challenge

Computer Speech and Language

Quantified Score

Hi-index	0.00

Visualization

Abstract

Speech recognition in noisy conditions is a major challenge for computer systems, but the human brain performs it routinely and accurately. Automatic speech recognition ASR systems that are inspired by neuroscience can potentially bridge the performance gap between humans and machines. We present a system for noise-robust isolated word recognition that works by decoding sequences of spikes from a population of simulated auditory feature-detecting neurons. Each neuron is trained to respond selectively to a brief spectrotemporal pattern, or feature, drawn from the simulated auditory nerve response to speech. The neural population conveys the time-dependent structure of a sound by its sequence of spikes. We compare two methods for decoding the spike sequences-one using a hidden Markov model-based recognizer, the other using a novel template-based recognition scheme. In the latter case, words are recognized by comparing their spike sequences to template sequences obtained from clean training data, using a similarity measure based on the length of the longest common sub-sequence. Using isolated spoken digits from the AURORA-2 database, we show that our combined system outperforms a state-of-the-art robust speech recognizer at low signal-to-noise ratios. Both the spike-based encoding scheme and the template-based decoding offer gains in noise robustness over traditional speech recognition methods. Our system highlights potential advantages of spike-based acoustic coding and provides a biologically motivated framework for robust ASR development.