Word recognition with a hierarchical neural network

Authors:
Xavier Domont;Martin Heckmann;Heiko Wersing;Frank Joublin;Stefan Menzel;Bernhard Sendhoff;Christian Goerick
Affiliations:
Honda Research Institute Europe GmbH, Offenbach am Main, Germany and Technische Universität Darmstadt, Control Theory and Robotics Lab, Darmstadt, Germany;Honda Research Institute Europe GmbH, Offenbach am Main, Germany;Honda Research Institute Europe GmbH, Offenbach am Main, Germany;Honda Research Institute Europe GmbH, Offenbach am Main, Germany;Honda Research Institute Europe GmbH, Offenbach am Main, Germany;Honda Research Institute Europe GmbH, Offenbach am Main, Germany;Honda Research Institute Europe GmbH, Offenbach am Main, Germany
Venue:
NOLISP'07 Proceedings of the 2007 international conference on Advances in nonlinear speech processing
Year:
2007

Citing 9
Cited 1

Evolutionary algorithms in theory and practice: evolution strategies, evolutionary programming, genetic algorithms

Evolutionary algorithms in theory and practice: evolution strategies, evolutionary programming, genetic algorithms
Speech recognition by machines and humans

Speech Communication
Evolution and Optimum Seeking: The Sixth Generation

Evolution and Optimum Seeking: The Sixth Generation
Learning optimized features for hierarchical models of invariant object recognition

Neural Computation
Non-negative Matrix Factorization with Sparseness Constraints

The Journal of Machine Learning Research
Efficient cepstral normalization for robust speech recognition

HLT '93 Proceedings of the workshop on Human Language Technology
Sphinx-4: a flexible open source framework for speech recognition

Sphinx-4: a flexible open source framework for speech recognition
Discrimination of speech from nonspeech based on multiscale spectro-temporal Modulations

IEEE Transactions on Audio, Speech, and Language Processing
Evolutionary optimization of a hierarchical object recognition model

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics

A hierarchical framework for spectro-temporal feature extraction

Speech Communication

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we propose a feedforward neural network for syllable recognition. The core of the recognition system is based on a hierarchical architecture initially developed for visual object recognition. We show that, given the similarities between the primary auditory and visual cortexes, such a system can successfully be used for speech recognition. Syllables are used as basic units for the recognition. Their spectrograms, computed using a Gammatone filterbank, are interpreted as images and subsequently feed into the neural network after a preprocessing step that enhances the formant frequencies and normalizes the length of the syllables. The performance of our system has been analyzed on the recognition of 25 different monosyllabic words. The parameters of the architecture have been optimized using an evolutionary strategy. Compared to the Sphinx-4 speech recognition system, our system achieves better robustness and generalization capabilities in noisy conditions.