Large-scale multilabel propagation based on efficient sparse graph construction

  • Authors:
  • Xiangyu Chen;Yadong Mu;Hairong Liu;Shuicheng Yan;Yong Rui;Tat-Seng Chua

  • Affiliations:
  • National University of Singapore and Institute for Infocomm Research, Singapore;Columbia University, Singapore;Purdue University, Singapore;National University of Singapore, Singapore;Microsoft Research Asia, Beijing, P. R. China;National University of Singapore

  • Venue:
  • ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

With the popularity of photo-sharing websites, the number of web images has exploded into unseen magnitude. Annotating such large-scale data will cost huge amount of human resources and is thus unaffordable. Motivated by this challenging problem, we propose a novel sparse graph based multilabel propagation (SGMP) scheme for super large scale datasets. Both the efficacy and accuracy of the image annotation are further investigated under different graph construction strategies, where Gaussian noise and non-Gaussian sparse noise are simultaneously considered in the formulations of these strategies. Our proposed approach outperforms the state-of-the-art algorithms by focusing on: (1) For large-scale graph construction, a simple yet efficient LSH (Locality Sensitive Hashing)-based sparse graph construction scheme is proposed to speed up the construction. We perform the multilabel propagation on this hashing-based graph construction, which is derived with LSH approach followed by sparse graph construction within the individual hashing buckets; (2) To further improve the accuracy, we propose a novel sparsity induced scalable graph construction scheme, which is based on a general sparse optimization framework. Sparsity essentially implies a very strong prior: for large scale optimization, the values of most variables shall be zeros when the solution reaches the optimum. By utilizing this prior, the solutions of large-scale sparse optimization problems can be derived by solving a series of much smaller scale subproblems; (3) For multilabel propagation, different from the traditional algorithms that propagate over individual label independently, our proposed propagation first encodes the label information of an image as a unit label confidence vector and naturally imposes inter-label constraints and manipulates labels interactively. Then, the entire propagation problem is formulated on the concept of Kullback-Leibler divergence defined on probabilistic distributions, which guides the propagation of the supervision information. Extensive experiments on the benchmark dataset NUS-WIDE with 270k images and its lite version NUS-WIDE-LITE with 56k images well demonstrate the effectiveness and scalability of the proposed multi-label propagation scheme.