Unsupervised and supervised visual codes with restricted boltzmann machines

Authors:
Hanlin Goh;Nicolas Thome;Matthieu Cord;Joo-Hwee Lim
Affiliations:
Laboratoire d'Informatique de Paris 6, UMPC - Sorbonne Universités, France,Institute for Infocomm Research, A*STAR, Singapore,Image and Pervasive Access Laboratory, CNRS UMI 2955, France;Laboratoire d'Informatique de Paris 6, UMPC - Sorbonne Universités, France;Laboratoire d'Informatique de Paris 6, UMPC - Sorbonne Universités, France;Institute for Infocomm Research, A*STAR, Singapore,Image and Pervasive Access Laboratory, CNRS UMI 2955, France
Venue:
ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part V
Year:
2012

Citing 13
Cited 1

Training products of experts by minimizing contrastive divergence

Neural Computation
Video Google: A Text Retrieval Approach to Object Matching in Videos

ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Distinctive Image Features from Scale-Invariant Keypoints

International Journal of Computer Vision
Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories

CVPRW '04 Proceedings of the 2004 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'04) Volume 12 - Volume 12
Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Supervised Learning of Quantizer Codebooks by Information Loss Minimization

IEEE Transactions on Pattern Analysis and Machine Intelligence
Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Semantic hashing

International Journal of Approximate Reasoning
Visual Word Ambiguity

IEEE Transactions on Pattern Analysis and Machine Intelligence
Efficient highly over-complete sparse coding using a mixture model

ECCV'10 Proceedings of the 11th European conference on Computer vision: Part V
Ask the locals: Multi-way local pooling for image recognition

ICCV '11 Proceedings of the 2011 International Conference on Computer Vision
The NBNN kernel

ICCV '11 Proceedings of the 2011 International Conference on Computer Vision
A graph-matching kernel for object categorization

ICCV '11 Proceedings of the 2011 International Conference on Computer Vision

Pooling in image representation: The visual codeword point of view

Computer Vision and Image Understanding

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recently, the coding of local features (e.g. SIFT) for image categorization tasks has been extensively studied. Incorporated within the Bag of Words (BoW) framework, these techniques optimize the projection of local features into the visual codebook, leading to state-of-the-art performances in many benchmark datasets. In this work, we propose a novel visual codebook learning approach using the restricted Boltzmann machine (RBM) as our generative model. Our contribution is three-fold. Firstly, we steer the unsupervised RBM learning using a regularization scheme, which decomposes into a combined prior for the sparsity of each feature's representation as well as the selectivity for each codeword. The codewords are then fine-tuned to be discriminative through the supervised learning from top-down labels. Secondly, we evaluate the proposed method with the Caltech-101 and 15-Scenes datasets, either matching or outperforming state-of-the-art results. The codebooks are compact and inference is fast. Finally, we introduce an original method to visualize the codebooks and decipher what each visual codeword encodes.