Large-scale multilabel propagation based on efficient sparse graph construction

Authors:
Xiangyu Chen;Yadong Mu;Hairong Liu;Shuicheng Yan;Yong Rui;Tat-Seng Chua
Affiliations:
National University of Singapore and Institute for Infocomm Research, Singapore;Columbia University, Singapore;Purdue University, Singapore;National University of Singapore, Singapore;Microsoft Research Asia, Beijing, P. R. China;National University of Singapore
Venue:
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
Year:
2013

Citing 32
Cited 0

Elements of information theory

Elements of information theory
Approximate nearest neighbors: towards removing the curse of dimensionality

STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Similarity estimation techniques from rounding algorithms

STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
Atomic Decomposition by Basis Pursuit

SIAM Review
Grafting: fast, incremental feature selection by gradient descent in function space

The Journal of Machine Learning Research
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
Convex Optimization

Convex Optimization
Locality-sensitive hashing scheme based on p-stable distributions

SCG '04 Proceedings of the twentieth annual symposium on Computational geometry
Label propagation through linear neighborhoods

ICML '06 Proceedings of the 23rd international conference on Machine learning
Large scale semi-supervised linear SVMs

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Large Scale Transductive SVMs

The Journal of Machine Learning Research
Scalable training of L1-regularized log-linear models

Proceedings of the 24th international conference on Machine learning
Correlative multi-label video annotation

Proceedings of the 15th international conference on Multimedia
Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions

Communications of the ACM - 50th anniversary issue: 1958 - 2008
Large scale manifold transduction

Proceedings of the 25th international conference on Machine learning
Graph-based semi-supervised learning with multiple labels

Journal of Visual Communication and Image Representation
Semi-supervised multi-label learning by constrained non-negative matrix factorization

AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
Soft-supervised learning for text classification

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Inferring semantic concepts from community-contributed images and noisy tags

MM '09 Proceedings of the 17th ACM international conference on Multimedia
Unified video annotation via multigraph learning

IEEE Transactions on Circuits and Systems for Video Technology
NUS-WIDE: a real-world web image database from National University of Singapore

Proceedings of the ACM International Conference on Image and Video Retrieval
Fixed-Point Continuation for $\ell_1$-Minimization: Methodology and Convergence

SIAM Journal on Optimization
Probing the Pareto Frontier for Basis Pursuit Solutions

SIAM Journal on Scientific Computing
A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems

SIAM Journal on Imaging Sciences
A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems

SIAM Journal on Imaging Sciences
Artificial Intelligence: A Modern Approach

Artificial Intelligence: A Modern Approach
Multi-task feature learning via efficient l2, 1-norm minimization

UAI '09 Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence
Learning with l1-graph for image analysis

IEEE Transactions on Image Processing
Efficient large-scale image annotation by probabilistic collaborative multi-label propagation

Proceedings of the international conference on Multimedia
A Singular Value Thresholding Algorithm for Matrix Completion

SIAM Journal on Optimization
Robust Non-negative Graph Embedding: Towards noisy data, unreliable graphs, and noisy labels

CVPR '12 Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Accelerated large scale optimization by concomitant hashing

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

With the popularity of photo-sharing websites, the number of web images has exploded into unseen magnitude. Annotating such large-scale data will cost huge amount of human resources and is thus unaffordable. Motivated by this challenging problem, we propose a novel sparse graph based multilabel propagation (SGMP) scheme for super large scale datasets. Both the efficacy and accuracy of the image annotation are further investigated under different graph construction strategies, where Gaussian noise and non-Gaussian sparse noise are simultaneously considered in the formulations of these strategies. Our proposed approach outperforms the state-of-the-art algorithms by focusing on: (1) For large-scale graph construction, a simple yet efficient LSH (Locality Sensitive Hashing)-based sparse graph construction scheme is proposed to speed up the construction. We perform the multilabel propagation on this hashing-based graph construction, which is derived with LSH approach followed by sparse graph construction within the individual hashing buckets; (2) To further improve the accuracy, we propose a novel sparsity induced scalable graph construction scheme, which is based on a general sparse optimization framework. Sparsity essentially implies a very strong prior: for large scale optimization, the values of most variables shall be zeros when the solution reaches the optimum. By utilizing this prior, the solutions of large-scale sparse optimization problems can be derived by solving a series of much smaller scale subproblems; (3) For multilabel propagation, different from the traditional algorithms that propagate over individual label independently, our proposed propagation first encodes the label information of an image as a unit label confidence vector and naturally imposes inter-label constraints and manipulates labels interactively. Then, the entire propagation problem is formulated on the concept of Kullback-Leibler divergence defined on probabilistic distributions, which guides the propagation of the supervision information. Extensive experiments on the benchmark dataset NUS-WIDE with 270k images and its lite version NUS-WIDE-LITE with 56k images well demonstrate the effectiveness and scalability of the proposed multi-label propagation scheme.