Fundamentals of statistical signal processing: estimation theory
Fundamentals of statistical signal processing: estimation theory
IEEE Transactions on Pattern Analysis and Machine Intelligence
Robust Sensor Fusion: Analysis and Application to Audio Visual Speech Recognition
Machine Learning - Special issue on context sensitivity and concept drift
Adaptive fusion of acoustic and visual sources for automatic speech recognition
Speech Communication - Special issue on auditory-visual speech processing
Statistical Pattern Recognition: A Review
IEEE Transactions on Pattern Analysis and Machine Intelligence
Multi-stream adaptive evidence combination for noise robust ASR
Speech Communication - Special issue on noise robust ASR
IEEE Transactions on Pattern Analysis and Machine Intelligence
Extraction of Visual Features for Lipreading
IEEE Transactions on Pattern Analysis and Machine Intelligence
Data Fusion for Sensory Information Processing Systems
Data Fusion for Sensory Information Processing Systems
Weighted Matching Algorithms and Reliability in Noise Cancelling by Spectral Subtraction
ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 2 - Volume 2
Active Appearance Models Revisited
International Journal of Computer Vision
Noise adaptive stream weighting in audio-visual speech recognition
EURASIP Journal on Applied Signal Processing
Dynamic Bayesian networks for audio-visual speech recognition
EURASIP Journal on Applied Signal Processing
Asynchrony modeling for audio-visual speech recognition
HLT '02 Proceedings of the second international conference on Human Language Technology Research
An evaluation of visual speech features for the tasks of speech and speaker recognition
AVBPA'03 Proceedings of the 4th international conference on Audio- and video-based biometric person authentication
Wavelet-based statistical signal processing using hidden Markovmodels
IEEE Transactions on Signal Processing
Audio-visual speech modeling for continuous speech recognition
IEEE Transactions on Multimedia
A system for the semantic multimodal analysis of news audio-visual content
EURASIP Journal on Advances in Signal Processing
Content-Based Multimedia Retrieval Using Feature Correlation Clustering and Fusion
International Journal of Multimedia Data Engineering & Management
The Visual Computer: International Journal of Computer Graphics
Hi-index | 0.00 |
While the accuracy of feature measurements heavily depends on changing environmental conditions, studying the consequences of this fact in pattern recognition tasks has received relatively little attention to date. In this paper, we explicitly take feature measurement uncertainty into account and show how multimodal classification and learning rules should be adjusted to compensate for its effects. Our approach is particularly fruitful in multimodal fusion scenarios, such as audiovisual speech recognition, where multiple streams of complementary time-evolving features are integrated. For such applications, provided that the measurement noise uncertainty for each feature stream can be estimated, the proposed framework leads to highly adaptive multimodal fusion rules which are easy and efficient to implement. Our technique is widely applicable and can be transparently integrated with either synchronous or asynchronous multimodal sequence integration architectures. We further show that multimodal fusion methods relying on stream weights can naturally emerge from our scheme under certain assumptions; this connection provides valuable insights into the adaptivity properties of our multimodal uncertainty compensation approach. We show how these ideas can be practically applied for audiovisual speech recognition. In this context, we propose improved techniques for person-independent visual feature extraction and uncertainty estimation with active appearance models, and also discuss how enhanced audio features along with their uncertainty estimates can be effectively computed. We demonstrate the efficacy of our approach in audiovisual speech recognition experiments on the CUAVE database using either synchronous or asynchronous multimodal integration models.