Local spatiotemporal descriptors for visual recognition of spoken phrases

Authors:
Guoying Zhao;Matti Pietikäinen;Abdenour Hadid
Affiliations:
University of Oulu: Finland, Oulu, Finland;University of Oulu: Finland, Oulu, Finland;University of Oulu: Finland, Oulu, Finland
Venue:
Proceedings of the international workshop on Human-centered multimedia
Year:
2007

Citing 13
Cited 2

Extraction of Visual Features for Lipreading

IEEE Transactions on Pattern Analysis and Machine Intelligence
BioID: A Multimodal Biometric Identification System

Computer
Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns

IEEE Transactions on Pattern Analysis and Machine Intelligence
Person identification using automatic integration of speech, lip, and face experts

WBMA '03 Proceedings of the 2003 ACM SIGMM workshop on Biometrics methods and applications
A segment-based audio-visual speech recognizer: data collection, development, and initial experiments

Proceedings of the 6th international conference on Multimodal interfaces
Visual Speech Recognition with Loosely Synchronized Feature Streams

ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision - Volume 2
Local Binary Patterns as an Image Preprocessing for Face Authentication

FGR '06 Proceedings of the 7th International Conference on Automatic Face and Gesture Recognition
Product HMMs for audio-visual continuous speech recognition using facial animation parameters

ICME '03 Proceedings of the 2003 International Conference on Multimedia and Expo - Volume 1
Face Description with Local Binary Patterns: Application to Face Recognition

IEEE Transactions on Pattern Analysis and Machine Intelligence
Dynamic Texture Recognition Using Local Binary Patterns with an Application to Facial Expressions

IEEE Transactions on Pattern Analysis and Machine Intelligence
Audio-visual speech recognition using MPEG-4 compliant visual features

EURASIP Journal on Applied Signal Processing
Dynamic Bayesian networks for audio-visual speech recognition

EURASIP Journal on Applied Signal Processing
Lipreading from color video

IEEE Transactions on Image Processing

Dynamic Texture Based Gait Recognition

ICB '09 Proceedings of the Third International Conference on Advances in Biometrics
Lipreading with local spatiotemporal descriptors

IEEE Transactions on Multimedia

Quantified Score

Hi-index	0.00

Visualization

Abstract

Visual speech information plays an important role in speech recognition under noisy conditions or for listeners with hearing impairment. In this paper, we propose local spatiotemporal descriptors to represent and recognize spoken isolated phrases based solely on visual input. Positions of the eyes determined by a robust face and eye detector are used for localizing the mouth regions in face images. Spatiotemporal local binary patterns extracted from these regions are used for describing phrase sequences. In our experiments with 817 sequences from ten phrases and 20 speakers, promising accuracies of 62% and 70% were obtained in speaker-independent and speaker-dependent recognition, respectively. In comparison with other methods on the Tulips1 audio-visual database, the accuracy 92.7% of our method clearly out performs the others. Advantages of our approach include local processing and robustness to monotonic gray-scale changes. Moreover, no error prone segmentation of moving lips is needed.