Multi-Level structured image coding on high-dimensional image representation

  • Authors:
  • Li-Jia Li;Jun Zhu;Hao Su;Eric P. Xing;Li Fei-Fei

  • Affiliations:
  • Computer Science Department, Stanford University and Yahoo! Research;Machine Learning Department, Carnegie Mellon University and Tsinghua University, China;Computer Science Department, Stanford University;Machine Learning Department, Carnegie Mellon University;Computer Science Department, Stanford University

  • Venue:
  • ACCV'12 Proceedings of the 11th Asian conference on Computer Vision - Volume Part II
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Robust image representations such as classemes [1], Object Bank (OB) [2], spatial pyramid representation(SPM) [3] have been proposed, showing superior performance in various high level visual recognition tasks. Our work is motivated by the need of exploring rich structural information encoded by these image representations. In this paper, we propose a novel Multi-Level Structured Image Coding approach to uncover the structure embedded in representations with rich regular structural information by learning a structured dictionary from it. Specifically, we choose Object Bank [2] to demonstrate our algorithm since it encodes both semantics and spatial location as structural information. By using the learned structured dictionary from Object Bank, we can compute a lower-dimensional and more compact encoding of the image features while preserving and accentuating the rich semantic and spatial information of OB. Our framework is an unsupervised method based on minimizing the reconstruction error of the image and object codes, with an innovative multi-level structural regularization scheme. The object dictionary and the image code obtained by our model offer intriguing intuition of real-world image structures while preserving informative structure of the original OB. We show that our more compact representation outperforms several state-of-the-art representations (including the original OB) on a wide range of high-level visual tasks such as scene classification, image retrieval and annotation.