Information theoretic feature extraction for audio-visual speech recognition

Authors:
Mihai Gurban;Jean-Philippe Thiran
Affiliations:
Signal Processing Laboratory, Ecole Polytechnique, Fédérale de Lausanne, Ecublens, Switzerland;Signal Processing Laboratory, Ecole Polytechnique, Fédérale de Lausanne, Ecublens, Switzerland
Venue:
IEEE Transactions on Signal Processing
Year:
2009

Citing 15
Cited 6

Fundamentals of digital image processing

Fundamentals of digital image processing
Elements of information theory

Elements of information theory
Speechreading using probabilistic models

Computer Vision and Image Understanding - Special issue on physics-based modeling and reasoning in computer vision
Relevance of time-frequency features for phonetic and speaker-channel classification

Speech Communication
Remote Sensing: Digital Image Analysis

Remote Sensing: Digital Image Analysis
Feature Selection for Knowledge Discovery and Data Mining

Feature Selection for Knowledge Discovery and Data Mining
Introduction to Algorithms

Introduction to Algorithms
Assessing face and speech consistency for monologue detection in video

Proceedings of the tenth ACM international conference on Multimedia
Object Recognition with Informative Features and Linear Classification

ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Fast Branch & Bound Algorithms for Optimal Feature Selection

IEEE Transactions on Pattern Analysis and Machine Intelligence
Fast Binary Feature Selection with Conditional Mutual Information

The Journal of Machine Learning Research
Feature Selection Based on Mutual Information: Criteria of Max-Dependency, Max-Relevance, and Min-Redundancy

IEEE Transactions on Pattern Analysis and Machine Intelligence
Moving-talker, speaker-independent feature study, and baseline results using the CUAVE multimodal speech corpus

EURASIP Journal on Applied Signal Processing
Using Broad Phonetic Group Experts for Improved Speech Recognition

IEEE Transactions on Audio, Speech, and Language Processing
Robust Biometric Person Identification Using Automatic Classifier Fusion of Speech, Mouth, and Face Experts

IEEE Transactions on Multimedia

Feature Selection for Gender Classification

IbPRIA'11 Proceedings of the 5th Iberian conference on Pattern recognition and image analysis
A feature selection method based on kernel canonical correlation analysis and the minimum Redundancy-Maximum Relevance filter method

Expert Systems with Applications: An International Journal
Radar HRRP recognition based on discriminant information analysis

WSEAS Transactions on Information Science and Applications
Low bias histogram-based estimation of mutual information for feature selection

Pattern Recognition Letters
Lip peripheral motion for visual surveillance

Proceedings of the Fifth International Conference on Security of Information and Networks
A new histogram-based estimation technique of entropy and mutual information using mean squared error minimization

Computers and Electrical Engineering

Quantified Score

Hi-index	35.69

Visualization

Abstract

The problem of feature selection has been thoroughly analyzed in the context of pattern classification, with the purpose of avoiding the curse of dimensionality. However, in the context of multimodal signal processing, this problem has been studied less. Our approach to feature extraction is based on information theory, with an application on multimodal classification, in particular audio-visual speech recognition. Contrary to previous work in information theoretic feature selection applied to multimodal signals, our proposed methods penalize features for their redundancy, achieving more compact feature sets and better performance. We propose two greedy selection algorithms, one that penalizes a proportion of feature redundancy, while the other uses conditional mutual information as an evaluation measure, for the selection of visual features for audio-visual speech recognition. Our features perform better than linear discriminant analysis, the most usual transform for dimensionality reduction in the field, across a wide range of dimensionality values and combined with audio at different quality levels.