The nature of statistical learning theory
The nature of statistical learning theory
Unsupervised learning by probabilistic latent semantic analysis
Machine Learning
SIMPLIcity: Semantics-Sensitive Integrated Matching for Picture LIbraries
IEEE Transactions on Pattern Analysis and Machine Intelligence
Neural Networks: A Comprehensive Foundation
Neural Networks: A Comprehensive Foundation
Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope
International Journal of Computer Vision
Contextual Priming for Object Detection
International Journal of Computer Vision
Object Recognition from Local Scale-Invariant Features
ICCV '99 Proceedings of the International Conference on Computer Vision-Volume 2 - Volume 2
The Journal of Machine Learning Research
Video Google: A Text Retrieval Approach to Object Matching in Videos
ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Learning to Detect Objects in Images via a Sparse, Part-Based Representation
IEEE Transactions on Pattern Analysis and Machine Intelligence
A Bayesian Hierarchical Model for Learning Natural Scene Categories
CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 2 - Volume 02
A Performance Evaluation of Local Descriptors
IEEE Transactions on Pattern Analysis and Machine Intelligence
Modeling Scenes with Local Descriptors and Latent Aspects
ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1 - Volume 01
Learning Object Categories from Google"s Image Search
ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision - Volume 2
Gist: A Mobile Robotics Application of Context-Based Vision in Outdoor Environment
CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Workshops - Volume 03
Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories
CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Rapid Biologically-Inspired Scene Classification Using Features Shared with Visual Attention
IEEE Transactions on Pattern Analysis and Machine Intelligence
Scene Classification Using a Hybrid Generative/Discriminative Approach
IEEE Transactions on Pattern Analysis and Machine Intelligence
Content-Based Hierarchical Classification of Vacation Images
ICMCS '99 Proceedings of the IEEE International Conference on Multimedia Computing and Systems - Volume 2
Scene Categorization by Introducing Contextual Information to the Visual Words
ISVC '09 Proceedings of the 5th International Symposium on Advances in Visual Computing: Part I
Scene categorization via contextual visual words
Pattern Recognition
Fusion of Global and Local Feature Using KCCA for Automatic Target Recognition
ICIG '09 Proceedings of the 2009 Fifth International Conference on Image and Graphics
LIBSVM: A library for support vector machines
ACM Transactions on Intelligent Systems and Technology (TIST)
ECCV'06 Proceedings of the 9th European conference on Computer Vision - Volume Part IV
CBSA: content-based soft annotation for multimodal image retrieval using Bayes point machines
IEEE Transactions on Circuits and Systems for Video Technology
Contextual Bag-of-Words for Visual Categorization
IEEE Transactions on Circuits and Systems for Video Technology
Hi-index | 0.01 |
In the field of visual recognition such as scene categorization, representing an image based on the local feature (e.g., the bag-of-visual-word (BOVW) model and the bag-of-contextual-visual-word (BOCVW) model) has become popular and one of the most successful methods. In this paper, we propose a method that uses localized maximum-margin learning to fuse different types of features during the BOCVW modeling for eventual scene classification. The proposed method fuses multiple features at the stage when the best contextual visual word is selected to represent a local region (hard assignment) or the probabilities of the candidate contextual visual words used to represent the unknown region are estimated (soft assignment). The merits of the proposed method are that (1) errors caused by the ambiguity of single feature when assigning local regions to the contextual visual words can be corrected or the probabilities of the candidate contextual visual words used to represent the region can be estimated more accurately; and that (2) it offers a more flexible way in fusing these features through determining the similarity-metric locally by localized maximum-margin learning. The proposed method has been evaluated experimentally and the results indicate its effectiveness.