Using the Tandem Approach for AF Classification in an AVSR System

Authors:
Tian Gan;Wolfgang Menzel;Jianwei Zhang
Affiliations:
Cinacs, Department of Informatics, University of Hamburg, Hamburg, Germany 22527;Cinacs, Department of Informatics, University of Hamburg, Hamburg, Germany 22527;Cinacs, Department of Informatics, University of Hamburg, Hamburg, Germany 22527
Venue:
ISNN '08 Proceedings of the 5th international symposium on Neural Networks: Advances in Neural Networks, Part II
Year:
2008

Citing 4
Cited 0

A Survey of Gesture RecognitionTechniques.

A Survey of Gesture RecognitionTechniques.
Robust Real-Time Face Detection

International Journal of Computer Vision
Articulatory features for robust visual speech recognition

Proceedings of the 6th international conference on Multimodal interfaces
Microstructural speech units and their HMM representation for discrete utterance speech recognition

ICASSP '91 Proceedings of the Acoustics, Speech, and Signal Processing, 1991. ICASSP-91., 1991 International Conference

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes an audio visual speech recognition (AVSR) system based on articulatory features (AF). It implements a tandem approach where artificial neural networks (ANN), in particular multi-layer perceptrons (MLP), are used as posterior probability estimators for transforming raw input data into the more abstract articulatory features. Such an approach is particularly well suited if relatively few training data are available, a situation which is typical for AVSR. In addition, the MLP feature extraction results and some analysis in terms of recognition accuracy and confusions are presented. Our AF-based AVSR system has been trained on the audio-visual speech corpus VIDTIMIT, which contains conversational speech based on a medium size vocabulary including more than 1200 words.