Real-time tracking of visually attended objects in interactive virtual environments

Authors:
Sungkil Lee;Gerard Jounghyun Kim;Seungmoon Choi
Affiliations:
Haptics and Virtual Reality Laboratory, POSTECH;Korea University;Haptics and Virtual Reality Laboratory, POSTECH
Venue:
Proceedings of the 2007 ACM symposium on Virtual reality software and technology
Year:
2007

Citing 17
Cited 7

Managing level of detail through peripheral degradation: effects on search performance with a head-mounted display

ACM Transactions on Computer-Human Interaction (TOCHI)
A Model of Saliency-Based Visual Attention for Rapid Scene Analysis

IEEE Transactions on Pattern Analysis and Machine Intelligence
Improved Computational Methods for Ray Tracing

ACM Transactions on Graphics (TOG)
Spatiotemporal sensitivity and visual attention for efficient rendering of dynamic environments

ACM Transactions on Graphics (TOG)
Data- and Model-Driven Gaze Control for an Active-Vision System

IEEE Transactions on Pattern Analysis and Machine Intelligence
Selective quality rendering by exploiting human inattentional blindness: looking but not seeing

VRST '02 Proceedings of the ACM symposium on Virtual reality software and technology
Visual attention-based polygon level of detail management

Proceedings of the 1st international conference on Computer graphics and interactive techniques in Australasia and South East Asia
An Attentional Prototype for Early Vision

ECCV '92 Proceedings of the Second European Conference on Computer Vision
An Introduction to the Kalman Filter

An Introduction to the Kalman Filter
Models of bottom-up and top-down visual attention

Models of bottom-up and top-down visual attention
Visual interest and NPR: an evaluation and manifesto

Proceedings of the 3rd international symposium on Non-photorealistic animation and rendering
Visual attention based information culling for Distributed Virtual Environments

Proceedings of the ACM symposium on Virtual reality software and technology
Mesh saliency

ACM SIGGRAPH 2005 Papers
A GPU based saliency map for high-fidelity selective rendering

AFRIGRAPH '06 Proceedings of the 4th international conference on Computer graphics, virtual reality, visualisation and interaction in Africa
Is bottom-up attention useful for object recognition?

CVPR'04 Proceedings of the 2004 IEEE computer society conference on Computer vision and pattern recognition
A generic framework of user attention model and its application in video summarization

IEEE Transactions on Multimedia
Properties and performance of a center/surround retinex

IEEE Transactions on Image Processing

A psychophysical study of fixation behavior in a computer game

Proceedings of the 5th symposium on Applied perception in graphics and visualization
The whys, how tos, and pitfalls of user studies

ACM SIGGRAPH 2009 Courses
An empirical pipeline to derive gaze prediction heuristics for 3D action games

ACM Transactions on Applied Perception (TAP)
Focus and context in mixed reality by modulating first order salient features

SG'10 Proceedings of the 10th international conference on Smart graphics
Parallel implementation of a spatio-temporal visual saliency model

Journal of Real-Time Image Processing
Directing attention and influencing memory with visual saliency modulation

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Real-time tracking of humans and visualization of their future footsteps in public indoor environments

Multimedia Tools and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a real-time framework for computationally tracking objects visually attended by the user while navigating in interactive virtual environments. In addition to the conventional bottom-up (stimulus-driven) features, the framework also uses topdown (goal-directed) contexts to predict the human gaze. The framework first builds feature maps using preattentive features such as luminance, hue, depth, size, and motion. The feature maps are then integrated into a single saliency map using the center-surround difference operation. This pixel-level bottom-up saliency map is converted to an object-level saliency map using the item buffer. Finally, the top-down contexts are inferred from the user's spatial and temporal behaviors during interactive navigation and used to select the most plausibly attended object among candidates produced in the object saliency map. The computational framework was implemented using the GPU and exhibited extremely fast computing performance (5.68 msec for a 256X256 saliency map), substantiating its adequacy for interactive virtual environments. A user experiment was also conducted to evaluate the prediction accuracy of the visual attention tracking framework with respect to actual human gaze data. The attained accuracy level was well supported by the theory of human cognition for visually identifying a single and multiple attentive targets, especially due to the addition of top-down contextual information. The framework can be effectively used for perceptually based rendering without employing an expensive eye tracker, such as providing the depth-of-field effects and managing the level-of-detail in virtual environments.