Discriminative Analysis of Lip Motion Features for Speaker Identification and Speech-Reading

Authors:
H. E. Cetingul;Y. Yemez;Engin Erzin;A. M. Tekalp
Affiliations:
Coll. of Eng., Koc Univ.;-;-;-
Venue:
IEEE Transactions on Image Processing
Year:
2006

Citing 0
Cited 11

Multimodal Person Recognition for Human-Vehicle Interaction

IEEE MultiMedia
Welfare interface implementation using multiple facial features tracking for the disabled people

Pattern Recognition Letters
Human Lips as Emerging Biometrics Modality

ICIAR '08 Proceedings of the 5th international conference on Image Analysis and Recognition
Lips Recognition for Biometrics

ICB '09 Proceedings of the Third International Conference on Advances in Biometrics
A new manifold representation for visual speech recognition

CAIP'07 Proceedings of the 12th international conference on Computer analysis of images and patterns
Intelligent computing for automated biometrics, criminal and forensic applications

ICIC'07 Proceedings of the intelligent computing 3rd international conference on Advanced intelligent computing theories and applications
Automatic segmentation of color lip images based on morphological filter

ICANN'10 Proceedings of the 20th international conference on Artificial neural networks: Part I
Physiological and behavioral lip biometrics: A comprehensive study of their discriminative power

Pattern Recognition
A local region based approach to lip tracking

Pattern Recognition
LUI: lip in multimodal mobile GUI interaction

Proceedings of the 14th ACM international conference on Multimodal interaction
Lip segmentation and tracking under MAP-MRF framework with unknown segment number

Neurocomputing

Quantified Score

Hi-index	0.01

Visualization

Abstract

There have been several studies that jointly use audio, lip intensity, and lip geometry information for speaker identification and speech-reading applications. This paper proposes using explicit lip motion information, instead of or in addition to lip intensity and/or geometry information, for speaker identification and speech-reading within a unified feature selection and discrimination analysis framework, and addresses two important issues: 1) Is using explicit lip motion information useful, and, 2) if so, what are the best lip motion features for these two applications? The best lip motion features for speaker identification are considered to be those that result in the highest discrimination of individual speakers in a population, whereas for speech-reading, the best features are those providing the highest phoneme/word/phrase recognition rate. Several lip motion feature candidates have been considered including dense motion features within a bounding box about the lip, lip contour motion features, and combination of these with lip shape features. Furthermore, a novel two-stage, spatial, and temporal discrimination analysis is introduced to select the best lip motion features for speaker identification and speech-reading applications. Experimental results using an hidden-Markov-model-based recognition system indicate that using explicit lip motion information provides additional performance gains in both applications, and lip motion features prove more valuable in the case of speech-reading application