Co-Adaptation of audio-visual speech and gesture classifiers

Authors:
C. Mario Christoudias;Kate Saenko;Louis-Philippe Morency;Trevor Darrell
Affiliations:
Massachusetts Institute of Technology, Cambridge, MA;Massachusetts Institute of Technology, Cambridge, MA;Massachusetts Institute of Technology, Cambridge, MA;Massachusetts Institute of Technology, Cambridge, MA
Venue:
Proceedings of the 8th international conference on Multimodal interfaces
Year:
2006

Citing 8
Cited 6

Combining labeled and unlabeled data with co-training

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Unsupervised Improvement of Visual Detectors using Co-Training

ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Modeling multimodal integration patterns and performance in seniors: toward adaptive processing of individual differences

Proceedings of the 5th international conference on Multimodal interfaces
Two-way adaptation for robust input interpretation in practical multimodal conversation systems

Proceedings of the 10th international conference on Intelligent user interfaces
Semisupervised learning from different information sources

Knowledge and Information Systems
Semi-Supervised Cross Feature Learning for Semantic Concept Detection in Videos

CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 1 - Volume 01
Detection of agreement vs. disagreement in meetings: training with unlabeled data

NAACL-Short '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: companion volume of the Proceedings of HLT-NAACL 2003--short papers - Volume 2
Adaptive view-based appearance models

CVPR'03 Proceedings of the 2003 IEEE computer society conference on Computer vision and pattern recognition

User and context adaptive neural networks for emotion recognition

Neurocomputing
Watch, Listen & Learn: Co-training on Captioned Images and Videos

ECML PKDD '08 Proceedings of the 2008 European Conference on Machine Learning and Knowledge Discovery in Databases - Part I
A Multi-view Approach for Relation Extraction

WISM '09 Proceedings of the International Conference on Web Information Systems and Mining
Layered hypernetwork models for cross-modal associative text and image keyword generation in multimodal information retrieval

PRICAI'10 Proceedings of the 11th Pacific Rim international conference on Trends in artificial intelligence
Learning to recognize objects from unseen modalities

ECCV'10 Proceedings of the 11th European conference on Computer vision: Part I
A multi-modal approach for natural human-robot interaction

ICSR'12 Proceedings of the 4th international conference on Social Robotics

Quantified Score

Hi-index	0.00

Visualization

Abstract

The construction of robust multimodal interfaces often requires large amounts of labeled training data to account for cross-user differences and variation in the environment. In this work, we investigate whether unlabeled training data can be leveraged to build more reliable audio-visual classifiers through co-training, a multi-view learning algorithm. Multimodal tasks are good candidates for multi-view learning, since each modality provides a potentially redundant view to the learning algorithm. We apply co-training to two problems: audio-visual speech unit classification, and user agreement recognition using spoken utterances and head gestures. We demonstrate that multimodal co-training can be used to learn from only a few labeled examples in one or both of the audio-visual modalities. We also propose a co-adaptation algorithm, which adapts existing audio-visual classifiers to a particular user or noise condition by leveraging the redundancy in the unlabeled data.