The Johnson-Lindenstrauss Lemma and the sphericity of some graphs
Journal of Combinatorial Theory Series A
Min-wise independent permutations (extended abstract)
STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Approximate nearest neighbors: towards removing the curse of dimensionality
STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Similarity estimation techniques from rounding algorithms
STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
Models and issues in data stream systems
Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
An elementary proof of a theorem of Johnson and Lindenstrauss
Random Structures & Algorithms
Comparing Data Streams Using Hamming Norms (How to Zero In)
IEEE Transactions on Knowledge and Data Engineering
An Approximate L1-Difference Algorithm for Massive Data Streams
FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
Stable distributions, pseudorandom generators, embeddings and data stream computation
FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
On the Resemblance and Containment of Documents
SEQUENCES '97 Proceedings of the Compression and Complexity of Sequences 1997
Database-friendly random projections: Johnson-Lindenstrauss with binary coins
Journal of Computer and System Sciences - Special issu on PODS 2001
Algorithmic Applications of Low-Distortion Geometric Embeddings
FOCS '01 Proceedings of the 42nd IEEE symposium on Foundations of Computer Science
On the Impossibility of Dimension Reduction in \ell _1
FOCS '03 Proceedings of the 44th Annual IEEE Symposium on Foundations of Computer Science
Locality-sensitive hashing scheme based on p-stable distributions
SCG '04 Proceedings of the twentieth annual symposium on Computational geometry
On the impossibility of dimension reduction in l1
Journal of the ACM (JACM)
Approximate nearest neighbors and the fast Johnson-Lindenstrauss transform
Proceedings of the thirty-eighth annual ACM symposium on Theory of computing
Stable distributions, pseudorandom generators, embeddings, and data stream computation
Journal of the ACM (JACM)
Using sketches to estimate associations
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Very sparse stable random projections for dimension reduction in lα (0
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
A Sketch Algorithm for Estimating Two-Way and Multi-Way Associations
Computational Linguistics
Nonlinear Estimators and Tail Bounds for Dimension Reduction in l1 Using Cauchy Random Projections
The Journal of Machine Learning Research
Estimators and tail bounds for dimension reduction in lα (0
Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms
On Estimating Frequency Moments of Data Streams
APPROX '07/RANDOM '07 Proceedings of the 10th International Workshop on Approximation and the 11th International Workshop on Randomization, and Combinatorial Optimization. Algorithms and Techniques
Nonlinear estimators and tail bounds for dimension reduction in l1 using Cauchy random projections
COLT'07 Proceedings of the 20th annual conference on Learning theory
Improving random projections using marginal information
COLT'06 Proceedings of the 19th annual conference on Learning Theory
IEEE Transactions on Information Theory
Estimators and tail bounds for dimension reduction in lα (0
Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms
SODA '09 Proceedings of the twentieth Annual ACM-SIAM Symposium on Discrete Algorithms
Private multiparty sampling and approximation of vector combinations
Theoretical Computer Science
UAI '09 Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence
Fast Manhattan sketches in data streams
Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Text relatedness based on a word thesaurus
Journal of Artificial Intelligence Research
Coresets and sketches for high dimensional subspace approximation problems
SODA '10 Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms
1-pass relative-error Lp-sampling with applications
SODA '10 Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms
On the exact space complexity of sketching and streaming small norms
SODA '10 Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms
Near-optimal private approximation protocols via a black box transformation
Proceedings of the forty-third annual ACM symposium on Theory of computing
Fast moment estimation in data streams in optimal space
Proceedings of the forty-third annual ACM symposium on Theory of computing
Estimating hybrid frequency moments of data streams
Journal of Combinatorial Optimization
Exact sparse recovery with L0 projections
Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Tight lower bound for linear sketches of moments
ICALP'13 Proceedings of the 40th international conference on Automata, Languages, and Programming - Volume Part I
Hi-index | 0.00 |
The method of stable random projections is popular in data stream computations, data mining, information retrieval, and machine learning, for efficiently computing the lα (0 We propose algorithms based on (1) the geometric mean estimator, for all 0 harmonic mean estimator, only for small α (e.g., α • The general sample complexity bound for α ≠ 1,2. For α = 1, [27] provided a nice argument based on the inverse of Cauchy density about the median, leading to a sample complexity bound, although they did not provide the constants and their proof restricted ε to be "small enough." For general α ≠ 1, 2, however, the task becomes much more difficult. [27] provided the "conceptual promise" that the sample complexity bound similar to that for α = 1 should exist for general α, if a "non-uniform algorithm based on t-quantile" could be implemented. Such a conceptual algorithm was only for supporting the arguments in [27], not a real implementation. We consider this is one of the main problems left open in [27]. In this study, we propose a practical algorithm based on the geometric mean estimator and derive the sample complexity bound for all 0 • The practical and optimal algorithm for α = 0+ The l0 norm is an important case. Stable random projections can provide an approximation to the l0 norm using α → 0+. We provide an algorithm based on the harmonic mean estimator, which is simple and statistically optimal. Its tail bounds are sharper than the bounds derived based on the geometric mean. We also discover a (possibly surprising) fact: in boolean data, stable random projections using α = 0+ with the harmonic mean estimator will be about twice as accurate as (l2) normal random projections. Because high-dimensional boolean data are common, we expect this fact will be practically quite useful. • The precise theoretical analysis and practical implications We provide the precise constants in the tail bounds for both the geometric mean and harmonic mean estimators. We also provide the variances (either exact or asymptotic) for the proposed estimators. These results can assist practitioners to choose sample sizes accurately.