Multimodal conversation scene analysis for understanding people's communicative behaviors in face-to-face meetings

Authors:
Kazuhiro Otsuka
Affiliations:
NTT Communication Science Laboratories, Nippon Telegraph and Telephone Corp., Atsugi-shi, Kanagawa-pref., Japan
Venue:
HCII'11 Proceedings of the 1st international conference on Human interface and the management of information: interacting with information - Volume Part II
Year:
2011

Citing 8
Cited 1

A probabilistic inference of multiparty-conversation structure based on Markov-switching models of gaze patterns, head directions, and utterances

ICMI '05 Proceedings of the 7th international conference on Multimodal interfaces
Automatic inference of cross-modal nonverbal interactions in multiparty conversations: "who responds to whom, when, and how?" from gaze, head gestures, and utterances

Proceedings of the 9th international conference on Multimodal interfaces
Fast and Robust Face Tracking for Analyzing Multiparty Face-to-Face Meetings

MLMI '08 Proceedings of the 5th international workshop on Machine Learning for Multimodal Interaction
A realtime multimodal system for analyzing group meetings by combining face pose tracking and speaker diarization

ICMI '08 Proceedings of the 10th international conference on Multimodal interfaces
Real-time Visual Tracker by Stream Processing

Journal of Signal Processing Systems
Automatic nonverbal analysis of social interaction in small groups: A review

Image and Vision Computing
A speaker diarization method based on the probabilistic fusion of audio-visual location information

Proceedings of the 2009 international conference on Multimodal interfaces
Conversation scene analysis based on dynamic Bayesian network and image-based gaze detection

International Conference on Multimodal Interfaces and the Workshop on Machine Learning for Multimodal Interaction

Metacommunication and semiotic engineering: insights from a study with mediated HCI

DUXU'13 Proceedings of the Second international conference on Design, User Experience, and Usability: design philosophy, methods, and tools - Volume Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

This presentation overviews our recent progress in multimodal conversation scene analysis, and discusses its future in terms of designing better human-to-human communication systems. Conversation scene analysis aims to provide the automatic description of conversation scenes from the multimodal nonverbal behaviors of participants as captured by cameras and microphones. So far, the author's group has proposed a research framework based on the probabilistic modeling of conversation phenomena for solving several basic problems including speaker diarization, i.e. "who is speaking when", addressee identification, i.e. "who is talking to whom", interaction structure, i.e. "who is responding to whom", the estimation of visual focus of attention (VFOA), i.e. "who is looking at whom", and the inference of interpersonal emotion such as "who has empathy/antipathy with whom", from observed multimodal behaviors including utterances, head pose, head gestures, eye-gaze, and facial expressions. This paper overviews our approach and discusses how conversation scene analysis can be extended to enhance the design process of computer-mediated communication systems.