Multimodal human behavior analysis: learning correlation and interaction across modalities

Authors:
Yale Song;Louis-Philippe Morency;Randall Davis
Affiliations:
Massachusettes Institute of Technology, Cambridge, MA, USA;University of Southern California, Los Angeles, CA, USA;Massachusettes Institute of Technology, Cambridge, MA, USA
Venue:
Proceedings of the 14th ACM international conference on Multimodal interaction
Year:
2012

Citing 8
Cited 3

A tutorial on hidden Markov models and selected applications in speech recognition

Readings in speech recognition
Probabilistic Networks and Expert Systems

Probabilistic Networks and Expert Systems
Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond

Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Canonical Correlation Analysis: An Overview with Application to Learning Methods

Neural Computation
Cross-modal correlation learning for clustering on image-audio dataset

Proceedings of the 15th international conference on Multimedia
Hidden Conditional Random Fields

IEEE Transactions on Pattern Analysis and Machine Intelligence
A Survey of Affect Recognition Methods: Audio, Visual, and Spontaneous Expressions

IEEE Transactions on Pattern Analysis and Machine Intelligence

Interactive relevance search and modeling: support for expert-driven analysis of multimodal data

Proceedings of the 15th ACM on International conference on multimodal interaction
Inferring social activities with mobile sensor networks

Proceedings of the 15th ACM on International conference on multimodal interaction
Review Article: Multimodal interaction: A review

Pattern Recognition Letters

Quantified Score

Hi-index	0.00

Visualization

Abstract

Multimodal human behavior analysis is a challenging task due to the presence of complex nonlinear correlations and interactions across modalities. We present a novel approach to this problem based on Kernel Canonical Correlation Analysis (KCCA) and Multi-view Hidden Conditional Random Fields (MV-HCRF). Our approach uses a nonlinear kernel to map multimodal data to a high-dimensional feature space and finds a new projection of the data that maximizes the correlation across modalities. We use a multi-chain structured graphical model with disjoint sets of latent variables, one set per modality, to jointly learn both view-shared and view-specific sub-structures of the projected data, capturing interaction across modalities explicitly. We evaluate our approach on a task of agreement and disagreement recognition from nonverbal audio-visual cues using the Canal 9 dataset. Experimental results show that KCCA makes capturing nonlinear hidden dynamics easier and MV-HCRF helps learning interaction across modalities.