Undo the codebook bias by linear transformation for visual applications

Authors:
Chunjie Zhang;Yifan Zhang;Shuhui Wang;Junbiao Pang;Chao Liang;Qingming Huang;Qi Tian
Affiliations:
University of Chinese Academy of Sciences, Beijing, China;Institute of Automation, Chinese Academy of Sciences, Beijing, China;Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China;Beijing University of Technology, Beijing, China;School of Computer, Wuhan University, Wuhan, China;University of Chinese Academy of Sciences, Beijing, China;University of Texas at San Antonio, TX, USA
Venue:
Proceedings of the 21st ACM international conference on Multimedia
Year:
2013

Citing 17
Cited 0

Video Google: A Text Retrieval Approach to Object Matching in Videos

ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
A Maximum Entropy Framework for Part-Based Texture and Object Recognition

ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1 - Volume 01
Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Dataset Shift in Machine Learning

Dataset Shift in Machine Learning
Adapting visual category models to new domains

ECCV'10 Proceedings of the 11th European conference on Computer vision: Part IV
Image classification using spatial pyramid coding and visual word reweighting

ACCV'10 Proceedings of the 10th Asian conference on Computer vision - Volume Part III
Image retrieval based on micro-structure descriptor

Pattern Recognition
Unbiased look at dataset bias

CVPR '11 Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition
What you saw is not what you get: Domain adaptation using asymmetric kernel transforms

CVPR '11 Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition
Image classification by non-negative sparse coding, low-rank and sparse decomposition

CVPR '11 Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition
Online domain adaptation of a pre-trained cascade of classifiers

CVPR '11 Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition
A Boosting, Sparsity- Constrained Bilinear Model for Object Recognition

IEEE MultiMedia
Dyadic transfer learning for cross-domain image classification

ICCV '11 Proceedings of the 2011 International Conference on Computer Vision
Domain adaptation for object recognition: An unsupervised approach

ICCV '11 Proceedings of the 2011 International Conference on Computer Vision
Undoing the damage of dataset bias

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part I
Image classification using Harr-like transformation of local features with coding residuals

Signal Processing
Laplacian affine sparse coding with tilt and orientation consistency for image classification

Journal of Visual Communication and Image Representation

Quantified Score

Hi-index	0.00

Visualization

Abstract

The bag of visual words model (BoW) and its variants have demonstrate their effectiveness for visual applications and have been widely used by researchers. The BoW model first extracts local features and generates the corresponding codebook, the elements of a codebook are viewed as visual words. The local features within each image are then encoded to get the final histogram representation. However, the codebook is dataset dependent and has to be generated for each image dataset. This costs a lot of computational time and weakens the generalization power of the BoW model. To solve these problems, in this paper, we propose to undo the dataset bias by codebook linear transformation. To represent every points within the local feature space using Euclidean distance, the number of bases should be no less than the space dimensions. Hence, each codebook can be viewed as a linear transformation of these bases. In this way, we can transform the pre-learned codebooks for a new dataset. However, not all of the visual words are equally important for the new dataset, it would be more effective if we can make some selection using sparsity constraints and choose the most discriminative visual words for transformation. We propose an alternative optimization algorithm to jointly search for the optimal linear transformation matrixes and the encoding parameters. Image classification experimental results on several image datasets show the effectiveness of the proposed method.