Articulatory features for robust visual speech recognition

Authors:
Kate Saenko;Trevor Darrell;James R. Glass
Affiliations:
MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, MA;MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, MA;MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, MA
Venue:
Proceedings of the 6th international conference on Multimodal interfaces
Year:
2004

Citing 8
Cited 7

Active shape models—their training and application

Computer Vision and Image Understanding
The nature of statistical learning theory

The nature of statistical learning theory
Extraction of Visual Features for Lipreading

IEEE Transactions on Pattern Analysis and Machine Intelligence
Active Appearance Models

ECCV '98 Proceedings of the 5th European Conference on Computer Vision-Volume II - Volume II
Denoising of human speech using combined acoustic and EM sensor signal processing

ICASSP '00 Proceedings of the Acoustics, Speech, and Signal Processing, 2000. on IEEE International Conference - Volume 01
A support vector machine-based dynamic network for visual speech recognition applications

EURASIP Journal on Applied Signal Processing
Feature-based pronunciation modeling for speech recognition

HLT-NAACL-Short '04 Proceedings of HLT-NAACL 2004: Short Papers
Audio-visual speech modeling for continuous speech recognition

IEEE Transactions on Multimedia

A segment-based audio-visual speech recognizer: data collection, development, and initial experiments

Proceedings of the 6th international conference on Multimodal interfaces
Patch-based representation of visual speech

VisHCI '06 Proceedings of the HCSNet workshop on Use of vision in human-computer interaction - Volume 56
Voiceless speech recognition using dynamic visual speech features

VisHCI '06 Proceedings of the HCSNet workshop on Use of vision in human-computer interaction - Volume 56
Temporal filtering of visual speech for audio-visual speech recognition in acoustically and visually challenging environments

Proceedings of the 9th international conference on Multimodal interfaces
Using the Tandem Approach for AF Classification in an AVSR System

ISNN '08 Proceedings of the 5th international symposium on Neural Networks: Advances in Neural Networks, Part II
Dual stream speech recognition using articulatory syllable models

International Journal of Speech Technology
Integrating phonological knowledge in ASR systems for Spanish language

CIARP'10 Proceedings of the 15th Iberoamerican congress conference on Progress in pattern recognition, image analysis, computer vision, and applications

Quantified Score

Hi-index	0.01

Visualization

Abstract

Visual information has been shown to improve the performance of speech recognition systems in noisy acoustic environments. However, most audio-visual speech recognizers rely on a clean visual signal. In this paper, we explore a novel approach to visual speech modeling, based on articulatory features, which has potential benefits under visually challenging conditions. The idea is to use a set of parallel classifiers to extract different articulatory attributes from the input images, and then combine their decisions to obtain higher-level units, such as visemes or words. We evaluate our approach in a preliminary experiment on a small audio-visual database, using several image noise conditions, and compare it to the standard viseme-based modeling approach.