Comparison of image transform-based features for visual speech recognition in clean and corrupted videos

Authors:
Rowan Seymour;Darryl Stewart;Ji Ming
Affiliations:
School of Electronics, Electrical Engineering and Computer Science, Queen's University of Belfast, Belfast, Northern Ireland, UK;School of Electronics, Electrical Engineering and Computer Science, Queen's University of Belfast, Belfast, Northern Ireland, UK;School of Electronics, Electrical Engineering and Computer Science, Queen's University of Belfast, Belfast, Northern Ireland, UK
Venue:
Journal on Image and Video Processing - Anthropocentric Video Analysis: Tools and Applications
Year:
2008

Citing 6
Cited 2

Fundamentals of digital image processing

Fundamentals of digital image processing
Ten lectures on wavelets

Ten lectures on wavelets
Deformable templates

Active vision
Real-Time Lip Tracking for Audio-Visual Speech Recognition Applications

ECCV '96 Proceedings of the 4th European Conference on Computer Vision-Volume II - Volume II
Automatic lipreading to enhance speech recognition (speech reading)

Automatic lipreading to enhance speech recognition (speech reading)
Noise adaptive stream weighting in audio-visual speech recognition

EURASIP Journal on Applied Signal Processing

Energetic and informational masking effects in an audiovisual speech recognition system

IEEE Transactions on Audio, Speech, and Language Processing - Special issue on multimodal processing in speech-based interactions
Adaptive Reliability Measure and Optimum Integration Weight for Decision Fusion Audio-visual Speech Recognition

Journal of Signal Processing Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present results of a study into the performance of a variety of different image transform-based feature types for speaker-independent visual speech recognition of isolated digits. This includes the first reported use of features extracted using a discrete curvelet transform. The study will show a comparison of some methods for selecting features of each feature type and show the relative benefits of both static and dynamic visual features. The performance of the features will be tested on both clean video data and also video data corrupted in a variety of ways to assess each feature type's robustness to potential real-world conditions. One of the test conditions involves a novel form of video corruption we call jitter which simulates camera and/or head movement during recording.