A multimodal end-of-turn prediction model: learning from parasocial consensus sampling

Authors:
Lixing Huang;Louis-Philippe Morency;Jonathan Gratch
Affiliations:
University of Southern California, Playa Vista, CA;University of Southern California, Playa Vista, CA;University of Southern California, Playa Vista, CA
Venue:
The 10th International Conference on Autonomous Agents and Multiagent Systems - Volume 3
Year:
2011

Citing 4
Cited 3

Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Learning Smooth, Human-Like Turntaking in Realtime Dialogue

IVA '08 Proceedings of the 8th international conference on Intelligent Virtual Agents
Predicting Listener Backchannels: A Probabilistic Multimodal Approach

IVA '08 Proceedings of the 8th international conference on Intelligent Virtual Agents
Parasocial consensus sampling: combining multiple perspectives to learn virtual human behavior

Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1 - Volume 1

Virtual rapport 2.0

IVA'11 Proceedings of the 10th international conference on Intelligent virtual agents
Optimizing the turn-taking behavior of task-oriented spoken dialog systems

ACM Transactions on Speech and Language Processing (TSLP)
Predicting next speaker and timing from gaze transition patterns in multi-party meetings

Proceedings of the 15th ACM on International conference on multimodal interaction

Quantified Score

Hi-index	0.00

Visualization

Abstract

Virtual human, with realistic behaviors and social skills, evoke in users a range of social behaviors normally only seen in human face-to-face interactions. One of the key challenges in creating such virtual humans is to give them human-like conversational skills, such as turn-taking skill. In this paper, we propose a multimodal end-of-turn prediction model. Instead of recording face-to-face conversation data, we collect the turn-taking data using Parasocial Consensus Sampling (PCS) framework. Then we analyze the relationship between verbal and nonverbal features and turn-taking behaviors based on the consensus data and show how these features influence the time people use to take turns. Finally, we present a probabilistic multimodal end-ofturn prediction model, which enables virtual humans to make real-time turn-taking predictions. The result shows that our model achieves a higher accuracy than previous methods did.