Estimators and tail bounds for dimension reduction in lα (0

Authors:
Ping Li
Affiliations:
Cornell University, Ithaca, NY
Venue:
Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms
Year:
2008

Citing 27
Cited 14

The Johnson-Lindenstrauss Lemma and the sphericity of some graphs

Journal of Combinatorial Theory Series A
Min-wise independent permutations (extended abstract)

STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Approximate nearest neighbors: towards removing the curse of dimensionality

STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Similarity estimation techniques from rounding algorithms

STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
Models and issues in data stream systems

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
An elementary proof of a theorem of Johnson and Lindenstrauss

Random Structures & Algorithms
Comparing Data Streams Using Hamming Norms (How to Zero In)

IEEE Transactions on Knowledge and Data Engineering
An Approximate L1-Difference Algorithm for Massive Data Streams

FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
Stable distributions, pseudorandom generators, embeddings and data stream computation

FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
On the Resemblance and Containment of Documents

SEQUENCES '97 Proceedings of the Compression and Complexity of Sequences 1997
Database-friendly random projections: Johnson-Lindenstrauss with binary coins

Journal of Computer and System Sciences - Special issu on PODS 2001
Algorithmic Applications of Low-Distortion Geometric Embeddings

FOCS '01 Proceedings of the 42nd IEEE symposium on Foundations of Computer Science
On the Impossibility of Dimension Reduction in \ell _1

FOCS '03 Proceedings of the 44th Annual IEEE Symposium on Foundations of Computer Science
Locality-sensitive hashing scheme based on p-stable distributions

SCG '04 Proceedings of the twentieth annual symposium on Computational geometry
On the impossibility of dimension reduction in l1

Journal of the ACM (JACM)
An algorithmic theory of learning: Robust concepts and random projection

Machine Learning
Approximate nearest neighbors and the fast Johnson-Lindenstrauss transform

Proceedings of the thirty-eighth annual ACM symposium on Theory of computing
Stable distributions, pseudorandom generators, embeddings, and data stream computation

Journal of the ACM (JACM)
Using sketches to estimate associations

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Very sparse stable random projections for dimension reduction in lα (0

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
A Sketch Algorithm for Estimating Two-Way and Multi-Way Associations

Computational Linguistics
Nonlinear Estimators and Tail Bounds for Dimension Reduction in l1 Using Cauchy Random Projections

The Journal of Machine Learning Research
Estimators and tail bounds for dimension reduction in lα (0

Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms
On Estimating Frequency Moments of Data Streams

APPROX '07/RANDOM '07 Proceedings of the 10th International Workshop on Approximation and the 11th International Workshop on Randomization, and Combinatorial Optimization. Algorithms and Techniques
Nonlinear estimators and tail bounds for dimension reduction in l1 using Cauchy random projections

COLT'07 Proceedings of the 20th annual conference on Learning theory
Improving random projections using marginal information

COLT'06 Proceedings of the 19th annual conference on Learning Theory
Compressed sensing

IEEE Transactions on Information Theory

Estimators and tail bounds for dimension reduction in lα (0

Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms
Compressed counting

SODA '09 Proceedings of the twentieth Annual ACM-SIAM Symposium on Discrete Algorithms
Private multiparty sampling and approximation of vector combinations

Theoretical Computer Science
Improving compressed counting

UAI '09 Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence
Fast Manhattan sketches in data streams

Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Text relatedness based on a word thesaurus

Journal of Artificial Intelligence Research
Coresets and sketches for high dimensional subspace approximation problems

SODA '10 Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms
1-pass relative-error Lp-sampling with applications

SODA '10 Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms
On the exact space complexity of sketching and streaming small norms

SODA '10 Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms
Near-optimal private approximation protocols via a black box transformation

Proceedings of the forty-third annual ACM symposium on Theory of computing
Fast moment estimation in data streams in optimal space

Proceedings of the forty-third annual ACM symposium on Theory of computing
Estimating hybrid frequency moments of data streams

Journal of Combinatorial Optimization
Exact sparse recovery with L0 projections

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Tight lower bound for linear sketches of moments

ICALP'13 Proceedings of the 40th international conference on Automata, Languages, and Programming - Volume Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

The method of stable random projections is popular in data stream computations, data mining, information retrieval, and machine learning, for efficiently computing the lα (0 We propose algorithms based on (1) the geometric mean estimator, for all 0 harmonic mean estimator, only for small α (e.g., α • The general sample complexity bound for α ≠ 1,2. For α = 1, [27] provided a nice argument based on the inverse of Cauchy density about the median, leading to a sample complexity bound, although they did not provide the constants and their proof restricted ε to be "small enough." For general α ≠ 1, 2, however, the task becomes much more difficult. [27] provided the "conceptual promise" that the sample complexity bound similar to that for α = 1 should exist for general α, if a "non-uniform algorithm based on t-quantile" could be implemented. Such a conceptual algorithm was only for supporting the arguments in [27], not a real implementation. We consider this is one of the main problems left open in [27]. In this study, we propose a practical algorithm based on the geometric mean estimator and derive the sample complexity bound for all 0 • The practical and optimal algorithm for α = 0+ The l0 norm is an important case. Stable random projections can provide an approximation to the l0 norm using α → 0+. We provide an algorithm based on the harmonic mean estimator, which is simple and statistically optimal. Its tail bounds are sharper than the bounds derived based on the geometric mean. We also discover a (possibly surprising) fact: in boolean data, stable random projections using α = 0+ with the harmonic mean estimator will be about twice as accurate as (l2) normal random projections. Because high-dimensional boolean data are common, we expect this fact will be practically quite useful. • The precise theoretical analysis and practical implications We provide the precise constants in the tail bounds for both the geometric mean and harmonic mean estimators. We also provide the variances (either exact or asymptotic) for the proposed estimators. These results can assist practitioners to choose sample sizes accurately.