Extraction of Visual Features for Lipreading

Authors:
Iain Matthews;Timothy F. Cootes;J. Andrew Bangham;Stephen Cox;Richard Harvey
Affiliations:
Carnegie Mellon Univ., Pittsburgh, PA;Univ. of Manchester, UK;Univ. of East Anglia, Norfolk, UK;Univ. of East Anglia, Norfolk, UK;Univ. of East Anglia, Norfolk, UK
Venue:
IEEE Transactions on Pattern Analysis and Machine Intelligence
Year:
2002

Citing 26
Cited 48

Morphological Shape Decomposition

IEEE Transactions on Pattern Analysis and Machine Intelligence
Feature extraction from faces using deformable templates

International Journal of Computer Vision
Continuous automatic speech recognition by lipreading

Continuous automatic speech recognition by lipreading
Active shape models—their training and application

Computer Vision and Image Understanding
Automatic landmark generation for Point Distribution Models

BMVC 94 Proceedings of the conference on British machine vision (vol. 2)
Scale-Space From Nonlinear Filters

IEEE Transactions on Pattern Analysis and Machine Intelligence
Multiscale Nonlinear Decomposition: The Sieve Decomposition Theorem

IEEE Transactions on Pattern Analysis and Machine Intelligence
A technical introduction to digital video

A technical introduction to digital video
Speechreading using probabilistic models

Computer Vision and Image Understanding - Special issue on physics-based modeling and reasoning in computer vision
Scale-Space Theory in Computer Vision

Scale-Space Theory in Computer Vision
Motion-Based Recognition

Motion-Based Recognition
Face Recognition Using Active Appearance Models

ECCV '98 Proceedings of the 5th European Conference on Computer Vision-Volume II - Volume II
Active Appearance Models

ECCV '98 Proceedings of the 5th European Conference on Computer Vision-Volume II - Volume II
A Comparison of Active Shape Model and Scale Decomposition Based Features for Visual Speech Recognition

ECCV '98 Proceedings of the 5th European Conference on Computer Vision-Volume II - Volume II
Nonlinear Scale-Space from n-Dimensional Sieves

ECCV '96 Proceedings of the 4th European Conference on Computer Vision-Volume I - Volume I
Real-Time Lip Tracking for Audio-Visual Speech Recognition Applications

ECCV '96 Proceedings of the 4th European Conference on Computer Vision-Volume II - Volume II
Lip reading from scale-space measurements

CVPR '97 Proceedings of the 1997 Conference on Computer Vision and Pattern Recognition (CVPR '97)
Interpreting Face Images Using Active Appearance Models

FG '98 Proceedings of the 3rd. International Conference on Face & Gesture Recognition
Statistical Chromaticity-Based Lip Tracking with B-Splines

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97) -Volume 4 - Volume 4
Audio-Visual Interaction in Multimedia Communication

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97) -Volume 1 - Volume 1
Automatic lipreading to enhance speech recognition (speech reading)

Automatic lipreading to enhance speech recognition (speech reading)
Audio-visual speech recognition: preprocessing, learning and sensory integration

Audio-visual speech recognition: preprocessing, learning and sensory integration
3D Modeling and Tracking of Human Lip Motions

ICCV '98 Proceedings of the Sixth International Conference on Computer Vision
Accurate, Real-Time, Unadorned Lip Tracking

ICCV '98 Proceedings of the Sixth International Conference on Computer Vision
Integrating audio and visual information to provide highly robust speech recognition

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 02
Multiscale recursive medians, scale-space, and transforms with applications to image processing

IEEE Transactions on Image Processing

Articulatory features for robust visual speech recognition

Proceedings of the 6th international conference on Multimodal interfaces
Data Fusion and Multicue Data Matching by Diffusion Maps

IEEE Transactions on Pattern Analysis and Machine Intelligence
Recovering Facial Shape Using a Statistical Model of Surface Normal Direction

IEEE Transactions on Pattern Analysis and Machine Intelligence
Multimodal speaker/speech recognition using lip motion, lip texture and audio

Signal Processing - Special section: Multimodal human-computer interfaces
2D vs. 3D Deformable Face Models: Representational Power, Construction, and Real-Time Fitting

International Journal of Computer Vision
A two-channel training algorithm for hidden Markov model and its application to lip reading

EURASIP Journal on Applied Signal Processing
Local spatiotemporal descriptors for visual recognition of spoken phrases

Proceedings of the international workshop on Human-centered multimedia
Mouth center detection under active near infrared illumination

SIP'07 Proceedings of the 6th Conference on 6th WSEAS International Conference on Signal Processing - Volume 6
Visual recognition of speech consonants using facial movement features

Integrated Computer-Aided Engineering - Informatics in Control, Automation and Robotics
Combining Global and Local Classifiers for Lipreading

ACII '07 Proceedings of the 2nd international conference on Affective Computing and Intelligent Interaction
Real-Time Lip Contour Extraction and Tracking Using an Improved Active Contour Model

ISVC '08 Proceedings of the 4th International Symposium on Advances in Visual Computing, Part II
Audiovisual-to-articulatory inversion

Speech Communication
Japanese 45 Single Sounds Recognition Using Intraoral Shape

IEICE - Transactions on Information and Systems
Block-based motion estimation analysis for lip reading user authentication systems

WSEAS Transactions on Information Science and Applications
Motion estimation analysis for unsupervised training for lip reading user authentication systems

ICAI'09 Proceedings of the 10th WSEAS international conference on Automation & information
Automatic lip localization under face illumination with shadow consideration

Signal Processing
Adaptive multimodal fusion by uncertainty compensation with application to audiovisual speech recognition

IEEE Transactions on Audio, Speech, and Language Processing - Special issue on multimodal processing in speech-based interactions
Lipreading with local spatiotemporal descriptors

IEEE Transactions on Multimedia
Automatic visual feature extraction for mandarin audio-visual speech recognition

SMC'09 Proceedings of the 2009 IEEE international conference on Systems, Man and Cybernetics
Visual features extracting & selecting for lipreading

AVBPA'03 Proceedings of the 4th international conference on Audio- and video-based biometric person authentication
Audio-visual speaker identification based on the use of dynamic audio and visual features

AVBPA'03 Proceedings of the 4th international conference on Audio- and video-based biometric person authentication
An intelligent multimedia E-learning system for pronunciations

IEA/AIE'07 Proceedings of the 20th international conference on Industrial, engineering, and other applications of applied intelligent systems
Person identification using lip motion sequence

KES'07/WIRN'07 Proceedings of the 11th international conference, KES 2007 and XVII Italian workshop on neural networks conference on Knowledge-based intelligent information and engineering systems: Part I
A constrained optimization approach for an adaptive generalized subspace tracking algorithm

Computers and Electrical Engineering
Intelligent wheelchair multi-modal human-machine interfaces in lip contour extraction based on PMM

ROBIO'09 Proceedings of the 2009 international conference on Robotics and biomimetics
Automatic segmentation of color lip images based on morphological filter

ICANN'10 Proceedings of the 20th international conference on Artificial neural networks: Part I
Real-time lip reading system for isolated Korean word recognition

Pattern Recognition
Vowel recognition by using the combination of Haar wavelet and neural network

KES'10 Proceedings of the 14th international conference on Knowledge-based and intelligent information and engineering systems: Part I
Comparative analysis of lip features for person identification

Proceedings of the 8th International Conference on Frontiers of Information Technology
Lip synchronization from Thai speech

Proceedings of the 10th International Conference on Virtual Reality Continuum and Its Applications in Industry
An information acquiring channel —— lip movement

ACII'05 Proceedings of the First international conference on Affective Computing and Intelligent Interaction
Attractor-Guided particle filtering for lip contour tracking

ACCV'06 Proceedings of the 7th Asian conference on Computer Vision - Volume Part I
Mapping from speech to images using continuous state space models

MLMI'04 Proceedings of the First international conference on Machine Learning for Multimodal Interaction
Audio-Visual speaker identification via adaptive fusion using reliability estimates of both modalities

AVBPA'05 Proceedings of the 5th international conference on Audio- and Video-Based Biometric Person Authentication
Lip reading based on sampled active contour model

ICIAR'05 Proceedings of the Second international conference on Image Analysis and Recognition
Single image estimation of facial albedo maps

BVAI'05 Proceedings of the First international conference on Brain, Vision, and Artificial Intelligence
iFeeling: vibrotactile rendering of human emotions on mobile phones

Mobile Multimedia Processing
Lip localization based on active shape model and gaussian mixture model

PSIVT'06 Proceedings of the First Pacific Rim conference on Advances in Image and Video Technology
Physiological and behavioral lip biometrics: A comprehensive study of their discriminative power

Pattern Recognition
A local region based approach to lip tracking

Pattern Recognition
Lipreading procedure for liveness verification in video authentication systems

HAIS'12 Proceedings of the 7th international conference on Hybrid Artificial Intelligent Systems - Volume Part I
n-Gram modeling of relevant features for lip-reading

Proceedings of the International Conference on Advances in Computing, Communications and Informatics
Towards a visual speech learning system for the deaf by matching dynamic lip shapes

ICCHP'12 Proceedings of the 13th international conference on Computers Helping People with Special Needs - Volume Part I
Lip peripheral motion for visual surveillance

Proceedings of the Fifth International Conference on Security of Information and Networks
LUI: lip in multimodal mobile GUI interaction

Proceedings of the 14th ACM international conference on Multimodal interaction
Integration of face detection and user identification with visual speech recognition

ICONIP'12 Proceedings of the 19th international conference on Neural Information Processing - Volume Part V
Automatic visual speech segmentation and recognition using directional motion history images and Zernike moments

The Visual Computer: International Journal of Computer Graphics
About neural-network algorithms application in viseme classification problem with face video in audiovisual speech recognition systems

Optical Memory and Neural Networks

Quantified Score

Hi-index	0.17

Visualization

Abstract

The multimodal nature of speech is often ignored in human-computer interaction, but lip deformations and other body motion, such as those of the head, convey additional information. We integrate speech cues from many sources and this improves intelligibility, especially when the acoustic signal is degraded. This paper shows how this additional, often complementary, visual speech information can be used for speech recognition. Three methods for parameterizing lip image sequences for recognition using hidden Markov models are compared. Two of these are top-down approaches that fit a model of the inner and outer lip contours and derive lipreading features from a principal component analysis of shape or shape and appearance, respectively. The third, bottom-up, method uses a nonlinear scale-space analysis to form features directly from the pixel intensity. All methods are compared on a multitalker visual speech recognition task of isolated letters.