3D user-perspective, voxel-based estimation of visual focus of attention in dynamic meeting scenarios

  • Authors:
  • Michael Voit;Rainer Stiefelhagen

  • Affiliations:
  • Interactive Analysis and Diagnosis, Fraunhofer IOSB Karlsruhe, Karlsruhe, Germany;Institute of Anthropomatics, Karlsruhe Institute of Technology, Karlsruhe, Germany

  • Venue:
  • International Conference on Multimodal Interfaces and the Workshop on Machine Learning for Multimodal Interaction
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper we present a new framework for the online estimation of people's visual focus of attention from their head poses in dynamic meeting scenarios. We describe a voxel based approach to reconstruct the scene composition from an observer's perspective, in order to integrate occlusion handling and visibility verification. The observer's perspective is thereby simulated with live head pose tracking over four far-field views from the room's upper corners. We integrate motion and speech activity as further scene observations in a Bayesian Surprise framework to model prior attractors of attention within the situation's context. As evaluations on a dedicated dataset with 10 meeting videos show, this allows us to predict a meeting participant's focus of attention correctly in up to 72.2% of all frames.