3D user-perspective, voxel-based estimation of visual focus of attention in dynamic meeting scenarios

Authors:
Michael Voit;Rainer Stiefelhagen
Affiliations:
Interactive Analysis and Diagnosis, Fraunhofer IOSB Karlsruhe, Karlsruhe, Germany;Institute of Anthropomatics, Karlsruhe Institute of Technology, Karlsruhe, Germany
Venue:
International Conference on Multimodal Interfaces and the Workshop on Machine Learning for Multimodal Interaction
Year:
2010

Citing 12
Cited 4

Tracking the multi person wandering visual focus of attention

Proceedings of the 8th international conference on Multimodal interfaces
Audio-visual multi-person tracking and identification for smart environments

Proceedings of the 15th international conference on Multimedia
Tracking the Visual Focus of Attention for a Varying Number of Wandering People

IEEE Transactions on Pattern Analysis and Machine Intelligence
Visual Focus of Attention in Dynamic Meeting Scenarios

MLMI '08 Proceedings of the 5th international workshop on Machine Learning for Multimodal Interaction
Deducing the visual focus of attention from head pose estimation in dynamic multi-view meeting scenarios

ICMI '08 Proceedings of the 10th international conference on Multimodal interfaces
Investigating automatic dominance estimation in groups from visual attention and speaking activity

ICMI '08 Proceedings of the 10th international conference on Multimodal interfaces
Modeling dominance in group conversations using nonverbal activity cues

IEEE Transactions on Audio, Speech, and Language Processing - Special issue on multimodal processing in speech-based interactions
A System for Probabilistic Joint 3D Head Tracking and Pose Estimation in Low-Resolution, Multi-view Environments

ICVS '09 Proceedings of the 7th International Conference on Computer Vision Systems: Computer Vision Systems
Visual activity context for focus of attention estimation in dynamic meetings

ICME'09 Proceedings of the 2009 IEEE international conference on Multimedia and Expo
Extending touch: towards interaction with large-scale surfaces

Proceedings of the ACM International Conference on Interactive Tabletops and Surfaces
Face recognition in smart rooms

MLMI'07 Proceedings of the 4th international conference on Machine learning for multimodal interaction
A study on visual focus of attention recognition from head pose in a meeting room

MLMI'06 Proceedings of the Third international conference on Machine Learning for Multimodal Interaction

Toward multimodal situated analysis

ICMI '11 Proceedings of the 13th international conference on multimodal interfaces
Structural and temporal inference search (STIS): pattern identification in multimodal data

Proceedings of the 14th ACM international conference on Multimodal interaction
Interactive data-driven discovery of temporal behavior models from events in media streams

Proceedings of the 20th ACM international conference on Multimedia
Visual Focus of Attention in Non-calibrated Environments using Gaze Estimation

International Journal of Computer Vision

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we present a new framework for the online estimation of people's visual focus of attention from their head poses in dynamic meeting scenarios. We describe a voxel based approach to reconstruct the scene composition from an observer's perspective, in order to integrate occlusion handling and visibility verification. The observer's perspective is thereby simulated with live head pose tracking over four far-field views from the room's upper corners. We integrate motion and speech activity as further scene observations in a Bayesian Surprise framework to model prior attractors of attention within the situation's context. As evaluations on a dedicated dataset with 10 meeting videos show, this allows us to predict a meeting participant's focus of attention correctly in up to 72.2% of all frames.