Maximising audiovisual correlation with automatic lip tracking and vowel based segmentation

Authors:
Andrew Abel;Amir Hussain;Quoc-Dinh Nguyen;Fabien Ringeval;Mohamed Chetouani;Maurice Milgram
Affiliations:
Dept. of Computing Science, University of Stirling, Scotland, UK;Dept. of Computing Science, University of Stirling, Scotland, UK;Institute of Intelligent Systems and Robotics, University Pierre and Marie Curie-Paris 6, Paris, France;Institute of Intelligent Systems and Robotics, University Pierre and Marie Curie-Paris 6, Paris, France;Institute of Intelligent Systems and Robotics, University Pierre and Marie Curie-Paris 6, Paris, France;Institute of Intelligent Systems and Robotics, University Pierre and Marie Curie-Paris 6, Paris, France
Venue:
BioID_MultiComm'09 Proceedings of the 2009 joint COST 2101 and 2102 international conference on Biometric ID management and multimodal communication
Year:
2009

Citing 5
Cited 1

Matrix computations (3rd ed.)

Matrix computations (3rd ed.)
Automatic language identification: an alternative approach to phonetic modelling

Signal Processing - Special issue on emerging techniques for communication terminals
Semi adaptive appearance models for lip tracking

ICIP'09 Proceedings of the 16th IEEE international conference on Image processing
Audiovisual Synchronization and Fusion Using Canonical Correlation Analysis

IEEE Transactions on Multimedia
Sequential Karhunen-Loeve basis extraction and its application to images

IEEE Transactions on Image Processing

Towards IMACA: intelligent multimodal affective conversational agent

ICONIP'12 Proceedings of the 19th international conference on Neural Information Processing - Volume Part I

Quantified Score

Hi-index	0.01

Visualization

Abstract

In recent years, the established link between the various human communication production domains has become more widely utilised in the field of speech processing. In this work, a state of the art Semi Adaptive Appearance Model (SAAM) approach developed by the authors is used for automatic lip tracking, and an adapted version of our vowel based speech segmentation system is employed to automatically segment speech. Canonical Correlation Analysis (CCA) on segmented and non segmented data in a range of noisy speech environments finds that segmented speech has a significantly better audiovisual correlation, demonstrating the feasibility of our techniques for further development as part of a proposed audiovisual speech enhancement system.