Ten myths of multimodal interaction
Communications of the ACM
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
The Catchment Feature Model for Multimodal Language Analysis
ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
A shallow model of backchannel continuers in spoken dialogue
EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
Towards a model of face-to-face grounding
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
The Penn Treebank: annotating predicate argument structure
HLT '94 Proceedings of the workshop on Human Language Technology
Natural behavior of a listening agent
Lecture Notes in Computer Science
Logarithmic opinion pools for conditional random fields
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
Computing backchannel distributions in multi-party conversations
EmbodiedNLP '07 Proceedings of the Workshop on Embodied Language Processing
Real-time decision detection in multi-party dialogue
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3
Predicting subjectivity in multimodal conversations
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3
A spoken dialog system for chat-like conversations considering response timing
TSD'07 Proceedings of the 10th international conference on Text, speech and dialogue
Modeling wisdom of crowds using latent mixture of discriminative experts
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Computational study of human communication dynamic
J-HGBU '11 Proceedings of the 2011 joint ACM workshop on Human gesture and behavior understanding
Integrating backchannel prediction models into embodied conversational agents
IVA'12 Proceedings of the 12th international conference on Intelligent Virtual Agents
Hi-index | 0.00 |
During face-to-face conversation, people naturally integrate speech, gestures and higher level language interpretations to predict the right time to start talking or to give backchannel feedback. In this paper we introduce a new model called Latent Mixture of Discriminative Experts which addresses some of the key issues with multimodal language processing: (1) temporal synchrony/asynchrony between modalities, (2) micro dynamics and (3) integration of different levels of interpretation. We present an empirical evaluation on listener nonverbal feedback prediction (e.g., head nod), based on observable behaviors of the speaker. We confirm the importance of combining four types of multimodal features: lexical, syntactic structure, eye gaze, and prosody. We show that our Latent Mixture of Discriminative Experts model outperforms previous approaches based on Conditional Random Fields (CRFs) and Latent-Dynamic CRFs.