Multimodal conversation scene analysis for understanding people's communicative behaviors in face-to-face meetings

  • Authors:
  • Kazuhiro Otsuka

  • Affiliations:
  • NTT Communication Science Laboratories, Nippon Telegraph and Telephone Corp., Atsugi-shi, Kanagawa-pref., Japan

  • Venue:
  • HCII'11 Proceedings of the 1st international conference on Human interface and the management of information: interacting with information - Volume Part II
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

This presentation overviews our recent progress in multimodal conversation scene analysis, and discusses its future in terms of designing better human-to-human communication systems. Conversation scene analysis aims to provide the automatic description of conversation scenes from the multimodal nonverbal behaviors of participants as captured by cameras and microphones. So far, the author's group has proposed a research framework based on the probabilistic modeling of conversation phenomena for solving several basic problems including speaker diarization, i.e. "who is speaking when", addressee identification, i.e. "who is talking to whom", interaction structure, i.e. "who is responding to whom", the estimation of visual focus of attention (VFOA), i.e. "who is looking at whom", and the inference of interpersonal emotion such as "who has empathy/antipathy with whom", from observed multimodal behaviors including utterances, head pose, head gestures, eye-gaze, and facial expressions. This paper overviews our approach and discusses how conversation scene analysis can be extended to enhance the design process of computer-mediated communication systems.