A Model of Saliency-Based Visual Attention for Rapid Scene Analysis
IEEE Transactions on Pattern Analysis and Machine Intelligence
International Journal of Computer Vision
Efficient Visual Event Detection Using Volumetric Features
ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1 - Volume 01
Learning an Interest Operator from Human Eye Movements
CVPRW '06 Proceedings of the 2006 Conference on Computer Vision and Pattern Recognition Workshop
Pattern Recognition
Behavior recognition via sparse spatio-temporal features
ICCCN '05 Proceedings of the 14th International Conference on Computer Communications and Networks
Proceedings of the 29th DAGM conference on Pattern recognition
Spatiotemporal salient points for visual recognition of human actions
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Hi-index | 0.00 |
In this paper we propose a framework to learn and predict saliency in videos using human eye movements. In our approach, we record the eye-gaze of users as they are watching videos, and then learn the low level features of regions that are of visual interest. The learnt classifier is then used to predict salient regions in videos belonging to the same application. So far, predicting saliency in images and videos has been approached from mainly two different perspectives, namely visual attention modeling and spatio-temporal interest point detection. Such approaches are purely-vision based and detect regions having a predefined set of characteristics, such as complex motion or high contrast, for all kinds of videos. However, what is 'interesting' varies from one application to another. By learning features of regions that capture the attention of viewers while watching a video, we aim to distinguish those that are actually salient in the given context, from the rest. This is especially useful in an environment where users are interested only in a certain kind of activity, as in the case of surveillance or biomedical applications. In this paper, the proposed framework is implemented using a neural network that learns the low-level features defined in visual attention modeling literature (Itti's saliency model) based on the interesting regions as identified by the eye gaze movements of viewers. In our experiments with news videos of popular channels, the results show a significant improvement in the identification of relevant salient regions in such videos, when compared with existing approaches.