Learning attention based saliency in videos from human eye movements

Authors:
Sunaad Nataraju;Vineeth Balasubramanian;Sethuraman Panchanathan
Affiliations:
Center for Cognitive Ubiquitous Computing, Arizona State University, Tempe, AZ;Center for Cognitive Ubiquitous Computing, Arizona State University, Tempe, AZ;Center for Cognitive Ubiquitous Computing, Arizona State University, Tempe, AZ
Venue:
WMVC'09 Proceedings of the 2009 international conference on Motion and video computing
Year:
2009

Citing 8
Cited 0

A Model of Saliency-Based Visual Attention for Rapid Scene Analysis

IEEE Transactions on Pattern Analysis and Machine Intelligence
On Space-Time Interest Points

International Journal of Computer Vision
Efficient Visual Event Detection Using Volumetric Features

ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1 - Volume 01
Learning an Interest Operator from Human Eye Movements

CVPRW '06 Proceedings of the 2006 Conference on Computer Vision and Pattern Recognition Workshop
Attention-based similarity

Pattern Recognition
Behavior recognition via sparse spatio-temporal features

ICCCN '05 Proceedings of the 14th International Conference on Computer Communications and Networks
How to find interesting locations in video: a spatiotemporal interest point detector learned from human eye movements

Proceedings of the 29th DAGM conference on Pattern recognition
Spatiotemporal salient points for visual recognition of human actions

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we propose a framework to learn and predict saliency in videos using human eye movements. In our approach, we record the eye-gaze of users as they are watching videos, and then learn the low level features of regions that are of visual interest. The learnt classifier is then used to predict salient regions in videos belonging to the same application. So far, predicting saliency in images and videos has been approached from mainly two different perspectives, namely visual attention modeling and spatio-temporal interest point detection. Such approaches are purely-vision based and detect regions having a predefined set of characteristics, such as complex motion or high contrast, for all kinds of videos. However, what is 'interesting' varies from one application to another. By learning features of regions that capture the attention of viewers while watching a video, we aim to distinguish those that are actually salient in the given context, from the rest. This is especially useful in an environment where users are interested only in a certain kind of activity, as in the case of surveillance or biomedical applications. In this paper, the proposed framework is implemented using a neural network that learns the low-level features defined in visual attention modeling literature (Itti's saliency model) based on the interesting regions as identified by the eye gaze movements of viewers. In our experiments with news videos of popular channels, the results show a significant improvement in the identification of relevant salient regions in such videos, when compared with existing approaches.