Learning Big (Image) Data via Coresets for Dictionaries

Authors:
Dan Feldman;Micha Feigin;Nir Sochen
Affiliations:
CSAIL, MIT, Cambridge, USA;Media Lab., MIT, Cambridge, USA;Department of Applied Mathematics, Tel Aviv university, Tel Aviv, Israel
Venue:
Journal of Mathematical Imaging and Vision
Year:
2013

Citing 27
Cited 0

Decision theoretic generalizations of the PAC model for neural net and other learning applications

Information and Computation
Davenport-Schinzel sequences and their geometric applications

Davenport-Schinzel sequences and their geometric applications
Improved bounds on the sample complexity of learning

SODA '00 Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms
Algorithms for facility location problems with outliers

SODA '01 Proceedings of the twelfth annual ACM-SIAM symposium on Discrete algorithms
Improved bounds on the sample complexity of learning

Journal of Computer and System Sciences
Shape Analysis and Classification: Theory and Practice

Shape Analysis and Classification: Theory and Practice
Dictionary learning algorithms for sparse representation

Neural Computation
On coresets for k-means and k-median clustering

STOC '04 Proceedings of the thirty-sixth annual ACM symposium on Theory of computing
Approximating extent measures of points

Journal of the ACM (JACM)
Subgradient and sampling algorithms for l1 regression

SODA '05 Proceedings of the sixteenth annual ACM-SIAM symposium on Discrete algorithms
A fast k-means implementation using coresets

Proceedings of the twenty-second annual symposium on Computational geometry
Coresets forWeighted Facilities and Their Applications

FOCS '06 Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science
Smaller Coresets for k-Median and k-Means Clustering

Discrete & Computational Geometry
A PTAS for k-means clustering based on weak coresets

SCG '07 Proceedings of the twenty-third annual symposium on Computational geometry
Bi-criteria linear-time approximations for generalized k-mean/median/center

SCG '07 Proceedings of the twenty-third annual symposium on Computational geometry
Sampling algorithms and coresets for ℓp regression

Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms
Private coresets

Proceedings of the forty-first annual ACM symposium on Theory of computing
Image inpainting via sparse representation

ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
Universal ε-approximators for integrals

SODA '10 Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms
A unified framework for approximating and clustering data

Proceedings of the forty-third annual ACM symposium on Theory of computing
A near-linear algorithm for projective clustering integer points

Proceedings of the twenty-third annual ACM-SIAM symposium on Discrete Algorithms
Data reduction for weighted and outlier-resistant clustering

Proceedings of the twenty-third annual ACM-SIAM symposium on Discrete Algorithms
Adaptive sampling and fast low-rank matrix approximation

APPROX'06/RANDOM'06 Proceedings of the 9th international conference on Approximation Algorithms for Combinatorial Optimization Problems, and 10th international conference on Randomization and Computation
StreamKM++: A clustering algorithm for data streams

Journal of Experimental Algorithmics (JEA)
-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation

IEEE Transactions on Signal Processing
Structural risk minimization over data-dependent hierarchies

IEEE Transactions on Information Theory
Image Denoising Via Sparse and Redundant Representations Over Learned Dictionaries

IEEE Transactions on Image Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Signal and image processing have seen an explosion of interest in the last few years in a new form of signal/image characterization via the concept of sparsity with respect to a dictionary. An active field of research is dictionary learning: the representation of a given large set of vectors (e.g. signals or images) as linear combinations of only few vectors (patterns). To further reduce the size of the representation, the combinations are usually required to be sparse, i.e., each signal is a linear combination of only a small number of patterns.This paper suggests a new computational approach to the problem of dictionary learning, known in computational geometry as coresets. A coreset for dictionary learning is a small smart non-uniform sample from the input signals such that the quality of any given dictionary with respect to the input can be approximated via the coreset. In particular, the optimal dictionary for the input can be approximated by learning the coreset. Since the coreset is small, the learning is faster. Moreover, using merge-and-reduce, the coreset can be constructed for streaming signals that do not fit in memory and can also be computed in parallel.We apply our coresets for dictionary learning of images using the K-SVD algorithm and bound their size and approximation error analytically. Our simulations demonstrate gain factors of up to 60 in computational time with the same, and even better, performance. We also demonstrate our ability to perform computations on larger patches and high-definition images, where the traditional approach breaks down.