Modeling video viewing behaviors for viewer state estimation
Proceedings of the 20th ACM international conference on Multimedia
Predicting where we look from spatiotemporal gaps
Proceedings of the 15th ACM on International conference on multimodal interaction
Tag-Saliency: Combining bottom-up and top-down information for saliency detection
Computer Vision and Image Understanding
Hi-index | 0.00 |
Despite significant recent progress, the best available visual saliency models still lag behind human performance in predicting eye fixations in free-viewing of natural scenes. Majority of models are based on low-level visual features and the importance of top-down factors has not yet been fully explored or modeled. Here, we combine low-level features such as orientation, color, intensity, saliency maps of previous best bottom-up models with top-down cognitive visual features (e.g., faces, humans, cars, etc.) and learn a direct mapping from those features to eye fixations using Regression, SVM, and AdaBoost classifiers. By extensive experimenting over three benchmark eye-tracking datasets using three popular evaluation scores, we show that our boosting model outperforms 27 state-of-the-art models and is so far the closest model to the accuracy of human model for fixation prediction. Furthermore, our model successfully detects the most salient object in a scene without sophisticated image processings such as region segmentation.