Using the Tandem Approach for AF Classification in an AVSR System

  • Authors:
  • Tian Gan;Wolfgang Menzel;Jianwei Zhang

  • Affiliations:
  • Cinacs, Department of Informatics, University of Hamburg, Hamburg, Germany 22527;Cinacs, Department of Informatics, University of Hamburg, Hamburg, Germany 22527;Cinacs, Department of Informatics, University of Hamburg, Hamburg, Germany 22527

  • Venue:
  • ISNN '08 Proceedings of the 5th international symposium on Neural Networks: Advances in Neural Networks, Part II
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper describes an audio visual speech recognition (AVSR) system based on articulatory features (AF). It implements a tandem approach where artificial neural networks (ANN), in particular multi-layer perceptrons (MLP), are used as posterior probability estimators for transforming raw input data into the more abstract articulatory features. Such an approach is particularly well suited if relatively few training data are available, a situation which is typical for AVSR. In addition, the MLP feature extraction results and some analysis in terms of recognition accuracy and confusions are presented. Our AF-based AVSR system has been trained on the audio-visual speech corpus VIDTIMIT, which contains conversational speech based on a medium size vocabulary including more than 1200 words.