Latent mixture of discriminative experts for multimodal prediction modeling

Authors:
Derya Ozkan;Kenji Sagae;Louis-Philippe Morency
Affiliations:
USC Institute for Creative Technologies;USC Institute for Creative Technologies;USC Institute for Creative Technologies
Venue:
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Year:
2010

Citing 14
Cited 3

Ten myths of multimodal interaction

Communications of the ACM
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
The Catchment Feature Model for Multimodal Language Analysis

ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Discriminative Random Fields: A Discriminative Framework for Contextual Interaction in Classification

ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
A shallow model of backchannel continuers in spoken dialogue

EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
Towards a model of face-to-face grounding

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
The Penn Treebank: annotating predicate argument structure

HLT '94 Proceedings of the workshop on Human Language Technology
Natural behavior of a listening agent

Lecture Notes in Computer Science
Logarithmic opinion pools for conditional random fields

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Predicting evidence of understanding by monitoring user's task manipulation in multimodal conversations

ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
Computing backchannel distributions in multi-party conversations

EmbodiedNLP '07 Proceedings of the Workshop on Embodied Language Processing
Real-time decision detection in multi-party dialogue

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3
Predicting subjectivity in multimodal conversations

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3
A spoken dialog system for chat-like conversations considering response timing

TSD'07 Proceedings of the 10th international conference on Text, speech and dialogue

Modeling wisdom of crowds using latent mixture of discriminative experts

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Computational study of human communication dynamic

J-HGBU '11 Proceedings of the 2011 joint ACM workshop on Human gesture and behavior understanding
Integrating backchannel prediction models into embodied conversational agents

IVA'12 Proceedings of the 12th international conference on Intelligent Virtual Agents

Quantified Score

Hi-index	0.00

Visualization

Abstract

During face-to-face conversation, people naturally integrate speech, gestures and higher level language interpretations to predict the right time to start talking or to give backchannel feedback. In this paper we introduce a new model called Latent Mixture of Discriminative Experts which addresses some of the key issues with multimodal language processing: (1) temporal synchrony/asynchrony between modalities, (2) micro dynamics and (3) integration of different levels of interpretation. We present an empirical evaluation on listener nonverbal feedback prediction (e.g., head nod), based on observable behaviors of the speaker. We confirm the importance of combining four types of multimodal features: lexical, syntactic structure, eye gaze, and prosody. We show that our Latent Mixture of Discriminative Experts model outperforms previous approaches based on Conditional Random Fields (CRFs) and Latent-Dynamic CRFs.