Audio-Visual Speech Recognition One Pass Learning with Spiking Neurons

Authors:
Renaud Séguier;David Mercier
Affiliations:
-;-
Venue:
ICANN '02 Proceedings of the International Conference on Artificial Neural Networks
Year:
2002

Citing 5
Cited 1

Vector quantization and signal compression

Vector quantization and signal compression
Recognizing Action Units for Facial Expression Analysis

IEEE Transactions on Pattern Analysis and Machine Intelligence
Automatic snakes for robust lip boundaries extraction

ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 06
Audio-visual intent-to-speak detection for human-computer interaction

ICASSP '00 Proceedings of the Acoustics, Speech, and Signal Processing, 2000. on IEEE International Conference - Volume 04
Audio-visual speech modeling for continuous speech recognition

IEEE Transactions on Multimedia

A complex-valued spiking machine

ICANN/ICONIP'03 Proceedings of the 2003 joint international conference on Artificial neural networks and neural information processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a new application in the field of impulse neurons: audio-visual speech recognition. The features extracted from the audio (cepstral coefficients) and the video (height, width of the mouth, percentage of black and white pixels in the mouth) are sufficiently simple to consider a real time integration of the complete system. A generic preprocessing makes it possible to convert these features into an impulse sequence treated by the neural network which carries out the classification. The training is done in one pass: the user pronounces once all the words of the dictionary. The tests on the European M2VTS Data Base shows the interest of such a system in audio-visual speech recognition. In the presence of noise in particular, the audio-visual recognition is much better than the recognition based on the audio modality only.