Boosting bottom-up and top-down visual features for saliency estimation

Authors:
Ali Borji
Affiliations:
Department of Computer Science, University of Southern California, Los Angeles, CA 90089
Venue:
CVPR '12 Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Year:
2012

Citing 0
Cited 5

Modeling video viewing behaviors for viewer state estimation

Proceedings of the 20th ACM international conference on Multimedia
Predicting where we look from spatiotemporal gaps

Proceedings of the 15th ACM on International conference on multimodal interaction
Superpixel based color contrast and color distribution driven salient object detection

Image Communication
Tag-Saliency: Combining bottom-up and top-down information for saliency detection

Computer Vision and Image Understanding
Saliency-guided improvement for hand posture detection and recognition

Neurocomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Despite significant recent progress, the best available visual saliency models still lag behind human performance in predicting eye fixations in free-viewing of natural scenes. Majority of models are based on low-level visual features and the importance of top-down factors has not yet been fully explored or modeled. Here, we combine low-level features such as orientation, color, intensity, saliency maps of previous best bottom-up models with top-down cognitive visual features (e.g., faces, humans, cars, etc.) and learn a direct mapping from those features to eye fixations using Regression, SVM, and AdaBoost classifiers. By extensive experimenting over three benchmark eye-tracking datasets using three popular evaluation scores, we show that our boosting model outperforms 27 state-of-the-art models and is so far the closest model to the accuracy of human model for fixation prediction. Furthermore, our model successfully detects the most salient object in a scene without sophisticated image processings such as region segmentation.