Conversation scene analysis based on dynamic Bayesian network and image-based gaze detection

  • Authors:
  • Sebastian Gorga;Kazuhiro Otsuka

  • Affiliations:
  • NTT Communication Science Laboratories, Morinosato-Wakamiya, Atsugi, Japan;NTT Communication Science Laboratories, Morinosato-Wakamiya, Atsugi, Japan

  • Venue:
  • International Conference on Multimodal Interfaces and the Workshop on Machine Learning for Multimodal Interaction
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents a probabilistic framework, which incorporates automatic image-based gaze detection, for inferring the structure of multiparty face-to-face conversations. This framework aims to infer conversation regimes and gaze patterns from the nonverbal behaviors of meeting participants, which are captured from image and audio streams with cameras and microphones. The conversation regime corresponds to a global conversational pattern such as monologue and dialogue, and the gaze pattern indicates "who is looking at whom". Input nonverbal behaviors include presence/absence of utterances, head directions, and discrete head-centered eye-gaze directions. In contrast to conventional meeting analysis methods that focus only on the participant's head pose as a surrogate of visual focus of attention, this paper newly incorporates vision-based gaze detection combined with head pose tracking into a probabilistic conversation model based on dynamic Bayesian network. Our gaze detector is able to differentiate 3 to 5 different eye gaze directions, e.g. left, straight and right. Experiments on four-person conversations confirm the power of the proposed framework in identifying conversation structure and in estimating gaze patterns with higher accuracy then previous models.