A Model of Saliency-Based Visual Attention for Rapid Scene Analysis
IEEE Transactions on Pattern Analysis and Machine Intelligence
Dynamic Range Reduction Inspired by Photoreceptor Physiology
IEEE Transactions on Visualization and Computer Graphics
A Coherent Computational Approach to Model Bottom-Up Visual Attention
IEEE Transactions on Pattern Analysis and Machine Intelligence
Estimating average precision with incomplete and imperfect judgments
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
k-means++: the advantages of careful seeding
SODA '07 Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms
Image retrieval: Ideas, influences, and trends of the new age
ACM Computing Surveys (CSUR)
A comparison of color features for visual concept classification
CIVR '08 Proceedings of the 2008 international conference on Content-based image and video retrieval
A simple and efficient sampling method for estimating AP and NDCG
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Foundations and Trends in Information Retrieval
Using Human Visual System modeling for bio-inspired low level image processing
Computer Vision and Image Understanding
Saliency moments for image categorization
Proceedings of the 1st ACM International Conference on Multimedia Retrieval
Trajectories based descriptor for dynamic events annotation
J-MRE '11 Proceedings of the 2011 joint ACM workshop on Modeling and representing events
Human detection using oriented histograms of flow and appearance
ECCV'06 Proceedings of the 9th European conference on Computer Vision - Volume Part II
Action recognition by dense trajectories
CVPR '11 Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition
CVPR '12 Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Hi-index | 0.00 |
This paper proposes to investigate the potential benefit of the use of low-level human vision behaviors in the context of high-level semantic concept detection. A large part of the current approaches relies on the Bag-of-Words (BoW) model, which has proven itself to be a good choice especially for object recognition in images. Its extension from static images to video sequences exhibits some new problems to cope with, mainly the way to use the temporal information related to the concepts to detect (swimming, drinking...). In this study, we propose to apply a human retina model to preprocess video sequences before constructing the State-Of-The-Art BoW analysis. This preprocessing, designed in a way that enhances relevant information, increases the performance by introducing robustness to traditional image and video problems, such as luminance variation, shadows, compression artifacts and noise. Additionally, we propose a new segmentation method which enables a selection of low-level spatio-temporal potential areas of interest from the visual scene, without slowing the computation as much as a high-level saliency model would. These approaches are evaluated on the TrecVid 2010 and 2011 Semantic Indexing Task datasets, containing from 130 to 346 high-level semantic concepts. We also experiment with various parameter settings to check their effect on performance.