Lipreading with local spatiotemporal descriptors

Authors:
Guoying Zhao;Mark Barnard;Matti Pietikäinen
Affiliations:
Machine Vision Group, Infotech Oulu and Department of Electrical and Information Engineering, University of Oulu, Oulu, Finland;Machine Vision Group, University of Oulu, Oulu, Finland and Faculty of Engineering and Physical Sciences, University of Surrey, Guildford, UK;Machine Vision Group, Infotech Oulu and Department of Electrical and Information Engineering, University of Oulu, Oulu, Finland
Venue:
IEEE Transactions on Multimedia
Year:
2009

Citing 18
Cited 9

Extraction of Visual Features for Lipreading

IEEE Transactions on Pattern Analysis and Machine Intelligence
BioID: A Multimodal Biometric Identification System

Computer
Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns

IEEE Transactions on Pattern Analysis and Machine Intelligence
Audio-to-Visual Conversion Using Hidden Markov Models

PRICAI '02 Proceedings of the 7th Pacific Rim International Conference on Artificial Intelligence: Trends in Artificial Intelligence
Person identification using automatic integration of speech, lip, and face experts

WBMA '03 Proceedings of the 2003 ACM SIGMM workshop on Biometrics methods and applications
A segment-based audio-visual speech recognizer: data collection, development, and initial experiments

Proceedings of the 6th international conference on Multimodal interfaces
Visual Speech Recognition with Loosely Synchronized Feature Streams

ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision - Volume 2
Product HMMs for audio-visual continuous speech recognition using facial animation parameters

ICME '03 Proceedings of the 2003 International Conference on Multimedia and Expo - Volume 1
2D Cascaded AdaBoost for Eye Localization

ICPR '06 Proceedings of the 18th International Conference on Pattern Recognition - Volume 02
Face Description with Local Binary Patterns: Application to Face Recognition

IEEE Transactions on Pattern Analysis and Machine Intelligence
Dynamic Texture Recognition Using Local Binary Patterns with an Application to Facial Expressions

IEEE Transactions on Pattern Analysis and Machine Intelligence
Audio-visual speech recognition using MPEG-4 compliant visual features

EURASIP Journal on Applied Signal Processing
Dynamic Bayesian networks for audio-visual speech recognition

EURASIP Journal on Applied Signal Processing
Local spatiotemporal descriptors for visual recognition of spoken phrases

Proceedings of the international workshop on Human-centered multimedia
Boosted multi-resolution spatiotemporal descriptors for facial expression recognition

Pattern Recognition Letters
Learning personal specific facial dynamics for face recognition from videos

AMFG'07 Proceedings of the 3rd international conference on Analysis and modeling of faces and gestures
Boosting local binary pattern (LBP)-Based face recognition

SINOBIOMETRICS'04 Proceedings of the 5th Chinese conference on Advances in Biometric Person Authentication
Lipreading from color video

IEEE Transactions on Image Processing

Combining dynamic texture and structural features for speaker identification

Proceedings of the 2nd ACM workshop on Multimedia in forensics, security and intelligence
Synthesizing a talking mouth

Proceedings of the Seventh Indian Conference on Computer Vision, Graphics and Image Processing
Expression recognition in videos using a weighted component-based feature descriptor

SCIA'11 Proceedings of the 17th Scandinavian conference on Image analysis
Facial expression recognition from near-infrared videos

Image and Vision Computing
Unsupervised temporal segmentation of talking faces using visual cues to improve emotion recognition

ACII'11 Proceedings of the 4th international conference on Affective computing and intelligent interaction - Volume Part I
Local maximum edge binary patterns: A new descriptor for image retrieval and object tracking

Signal Processing
Towards a visual speech learning system for the deaf by matching dynamic lip shapes

ICCHP'12 Proceedings of the 13th international conference on Computers Helping People with Special Needs - Volume Part I
Lip peripheral motion for visual surveillance

Proceedings of the Fifth International Conference on Security of Information and Networks
Automatic visual speech segmentation and recognition using directional motion history images and Zernike moments

The Visual Computer: International Journal of Computer Graphics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Visual speech information plays an important role in lipreading under noisy conditions or for listeners with a hearing impairment. In this paper, we present local spatiotemporal descriptors to represent and recognize spoken isolated phrases based solely on visual input. Spatiotemporal local binary patterns extracted from mouth regions are used for describing isolated phrase sequences. In our experiments with 817 sequences from ten phrases and 20 speakers, promising accuracies of 62% and 70% were obtained in speaker-independent and speaker-dependent recognition, respectively. In comparison with other methods on AVLetters database, the accuracy, 62.8%, of our method clearly outperforms the others. Analysis of the confusion matrix for 26 English letters shows the good clustering characteristics of visemes for the proposed descriptors. The advantages of our approach include local processing and robustness to monotonic gray-scale changes. Moreover, no error prone segmentation of moving lips is needed.