Bilevel visual words coding for image classification

Authors:
Jiemi Zhang;Chenxia Wu;Deng Cai;Jianke Zhu
Affiliations:
The State Key Lab of CAD&CG, College of Computer Science, Zhejiang University, China;The State Key Lab of CAD&CG, College of Computer Science, Zhejiang University, China;The State Key Lab of CAD&CG, College of Computer Science, Zhejiang University, China;The State Key Lab of CAD&CG, College of Computer Science, Zhejiang University, China
Venue:
IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Year:
2013

Citing 16
Cited 0

Video Google: A Text Retrieval Approach to Object Matching in Videos

ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories

CVPRW '04 Proceedings of the 2004 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'04) Volume 12 - Volume 12
Histograms of Oriented Gradients for Human Detection

CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 1 - Volume 01
Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
SVM-KNN: Discriminative Nearest Neighbor Classification for Visual Category Recognition

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Online Passive-Aggressive Algorithms

The Journal of Machine Learning Research
A survey of kernel and spectral methods for clustering

Pattern Recognition
Supervised Learning of Quantizer Codebooks by Information Loss Minimization

IEEE Transactions on Pattern Analysis and Machine Intelligence
Can movies and books collaborate?: cross-domain collaborative filtering for sparsity reduction

IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Visual Word Ambiguity

IEEE Transactions on Pattern Analysis and Machine Intelligence
Unsupervised feature selection for multi-cluster data

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
A fast dual projected Newton method for l1-regularized least squares

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Two
Human behavior analysis from video data using bag-of-gestures

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three
Vector quantization of image subbands: a survey

IEEE Transactions on Image Processing
Locality-constrained and spatially regularized coding for scene categorization

CVPR '12 Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
In defense of soft-assignment coding

ICCV '11 Proceedings of the 2011 International Conference on Computer Vision

Quantified Score

Hi-index	0.00

Visualization

Abstract

Bag-of-Words approach has played an important role in recent works for image classification. In consideration of efficiency, most methods use k- means clustering to generate the codebook. The obtained codebooks often lose the cluster size and shape information with distortion errors and low discriminative power. Though some efforts have been made to optimize codebook in sparse coding, they usually incur higher computational cost. Moreover, they ignore the correlations between codes in the following coding stage, that leads to low discriminative power of the final representation. In this paper, we propose a bilevel visual words coding approach in consideration of representation ability, discriminative power and efficiency. In the bilevel codebook generation stage, k-means and an efficient spectral clustering are respectively run in each level by taking both class information and the shapes of each visual word cluster into account. To obtain discriminative representation in the coding stage, we design a certain localized coding rule with bilevel codebook to select local bases. To further achieve an efficient coding referring to this rule, an online method is proposed to efficiently learn a projection of local descriptor to the visual words in the codebook. After projection, coding can be efficiently completed by a low dimensional localized soft-assignment. Experimental results show that our proposed bilevel visual words coding approach outperforms the state-of-the-art approaches for image classification.