Learning attention based saliency in videos from human eye movements

  • Authors:
  • Sunaad Nataraju;Vineeth Balasubramanian;Sethuraman Panchanathan

  • Affiliations:
  • Center for Cognitive Ubiquitous Computing, Arizona State University, Tempe, AZ;Center for Cognitive Ubiquitous Computing, Arizona State University, Tempe, AZ;Center for Cognitive Ubiquitous Computing, Arizona State University, Tempe, AZ

  • Venue:
  • WMVC'09 Proceedings of the 2009 international conference on Motion and video computing
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper we propose a framework to learn and predict saliency in videos using human eye movements. In our approach, we record the eye-gaze of users as they are watching videos, and then learn the low level features of regions that are of visual interest. The learnt classifier is then used to predict salient regions in videos belonging to the same application. So far, predicting saliency in images and videos has been approached from mainly two different perspectives, namely visual attention modeling and spatio-temporal interest point detection. Such approaches are purely-vision based and detect regions having a predefined set of characteristics, such as complex motion or high contrast, for all kinds of videos. However, what is 'interesting' varies from one application to another. By learning features of regions that capture the attention of viewers while watching a video, we aim to distinguish those that are actually salient in the given context, from the rest. This is especially useful in an environment where users are interested only in a certain kind of activity, as in the case of surveillance or biomedical applications. In this paper, the proposed framework is implemented using a neural network that learns the low-level features defined in visual attention modeling literature (Itti's saliency model) based on the interesting regions as identified by the eye gaze movements of viewers. In our experiments with news videos of popular channels, the results show a significant improvement in the identification of relevant salient regions in such videos, when compared with existing approaches.