Speaker-adaptive multimodal prediction model for listener responses

Authors:
Iwan de Kok;Dirk Heylen;Louis-Philippe Morency
Affiliations:
University of Twente, Enschede, Netherlands;University of Twente, Enschede, Netherlands;USC Institute for Creative Technologies, Los Angeles, USA
Venue:
Proceedings of the 15th ACM on International conference on multimodal interaction
Year:
2013

Citing 13
Cited 0

A voice reaction system with a visualized response equivalent to nodding

Proceedings of the third international conference on human-computer interaction, Vol.1 on Work with computers: organizational, management, stress and health aspects
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
A shallow model of backchannel continuers in spoken dialogue

EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
Natural behavior of a listening agent

Lecture Notes in Computer Science
Does the contingency of agents' nonverbal feedback affect users' social anxiety?

Proceedings of the 7th international joint conference on Autonomous agents and multiagent systems - Volume 1
Creating Rapport with Virtual Agents

IVA '07 Proceedings of the 7th international conference on Intelligent Virtual Agents
A probabilistic multimodal approach for predicting listener backchannels

Autonomous Agents and Multi-Agent Systems
Parasocial consensus sampling: combining multiple perspectives to learn virtual human behavior

Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1 - Volume 1
Learning and evaluating response prediction models using parallel listener consensus

International Conference on Multimodal Interfaces and the Workshop on Machine Learning for Multimodal Interaction
The multiLis corpus - dealing with individual differences in nonverbal listening behavior

Proceedings of the Third COST 2102 international training school conference on Toward autonomous, adaptive, and context-aware multimodal interfaces: theoretical and practical issues
Modeling wisdom of crowds using latent mixture of discriminative experts

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Appropriate and inappropriate timing of listener responses from multiple perspectives

IVA'11 Proceedings of the 10th international conference on Intelligent virtual agents
Dialogue act recognition using reweighted speaker adaptation

SIGDIAL '12 Proceedings of the 13th Annual Meeting of the Special Interest Group on Discourse and Dialogue

Quantified Score

Hi-index	0.00

Visualization

Abstract

The goal of this paper is to analyze and model the variability in speaking styles in dyadic interactions and build a predictive algorithm for listener responses that is able to adapt to these different styles. The end result of this research will be a virtual human able to automatically respond to a human speaker with proper listener responses (e.g., head nods). Our novel speaker-adaptive prediction model is created from a corpus of dyadic interactions where speaker variability is analyzed to identify a subset of prototypical speaker styles. During a live interaction our prediction model automatically identifies the closest prototypical speaker style and predicts listener responses based on this ``communicative style". Central to our approach is the idea of ``speaker profile" which uniquely identifies each speaker and enables the matching between prototypical speakers and new speakers. The paper shows the merits of our speaker-adaptive listener response prediction model by showing improvement over a state-of-the-art approach which does not adapt to the speaker. Besides the merits of speaker-adapta-tion, our experiments highlights the importance of using multimodal features when comparing speakers to select the closest prototypical speaker style.