Very sparse stable random projections for dimension reduction in lα (0

Authors:
Ping Li
Affiliations:
Stanford University
Venue:
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2007

Citing 45
Cited 2

The Johnson-Lindenstrauss Lemma and the sphericity of some graphs

Journal of Combinatorial Theory Series A
Term-weighting approaches in automatic text retrieval

Information Processing and Management: an International Journal
On the self-similar nature of Ethernet traffic (extended version)

IEEE/ACM Transactions on Networking (TON)
The space complexity of approximating the frequency moments

STOC '96 Proceedings of the twenty-eighth annual ACM symposium on Theory of computing
Self-similarity in World Wide Web traffic: evidence and possible causes

IEEE/ACM Transactions on Networking (TON)
Latent semantic indexing: a probabilistic analysis

PODS '98 Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Approximate nearest neighbors: towards removing the curse of dimensionality

STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
A theory of term weighting based on exploratory data analysis

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Foundations of statistical natural language processing

Foundations of statistical natural language processing
Term Weighting in Information Retrieval Using the Term Precision Model

Journal of the ACM (JACM)
Database-friendly random projections

PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Discovering unexpected information from your competitors' web sites

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Random projection in dimensionality reduction: applications to image and text data

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Similarity estimation techniques from rounding algorithms

STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
Models and issues in data stream systems

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Text Categorization with Support Vector Machines. How to Represent Texts in Input Space?

Machine Learning
An elementary proof of a theorem of Johnson and Lindenstrauss

Random Structures & Algorithms
Comparing Data Streams Using Hamming Norms (How to Zero In)

IEEE Transactions on Knowledge and Data Engineering
Random Projection: A New Approach to VLSI Layout

FOCS '98 Proceedings of the 39th Annual Symposium on Foundations of Computer Science
An Algorithmic Theory of Learning: Robust Concepts and Random Projection

FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
An Approximate L1-Difference Algorithm for Massive Data Streams

FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
Stable distributions, pseudorandom generators, embeddings and data stream computation

FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
On the Resemblance and Containment of Documents

SEQUENCES '97 Proceedings of the Compression and Complexity of Sequences 1997
Database-friendly random projections: Johnson-Lindenstrauss with binary coins

Journal of Computer and System Sciences - Special issu on PODS 2001
Algorithmic Applications of Low-Distortion Geometric Embeddings

FOCS '01 Proceedings of the 42nd IEEE symposium on Foundations of Computer Science
On the Impossibility of Dimension Reduction in \ell _1

FOCS '03 Proceedings of the 44th Annual IEEE Symposium on Foundations of Computer Science
Experiments with random projections for machine learning

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Retrieval of difficult image classes using svd-based relevance feedback

Proceedings of the 6th ACM SIGMM international workshop on Multimedia information retrieval
A comprehensive comparative study on term weighting schemes for text categorization with support vector machines

WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
On the impossibility of dimension reduction in l1

Journal of the ACM (JACM)
Random Projection-Based Multiplicative Data Perturbation for Privacy Preserving Distributed Data Mining

IEEE Transactions on Knowledge and Data Engineering
An algorithmic theory of learning: Robust concepts and random projection

Machine Learning
Approximate nearest neighbors and the fast Johnson-Lindenstrauss transform

Proceedings of the thirty-eighth annual ACM symposium on Theory of computing
Stable distributions, pseudorandom generators, embeddings, and data stream computation

Journal of the ACM (JACM)
Very sparse random projections

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Data Streams: Models and Algorithms (Advances in Database Systems)

Data Streams: Models and Algorithms (Advances in Database Systems)
Randomized algorithms and NLP: using locality sensitive hash function for high speed noun clustering

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Using sketches to estimate associations

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Comparing data streams using Hamming norms (how to zero in)

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
A Sketch Algorithm for Estimating Two-Way and Multi-Way Associations

Computational Linguistics
Nonlinear Estimators and Tail Bounds for Dimension Reduction in l1 Using Cauchy Random Projections

The Journal of Machine Learning Research
Nonlinear estimators and tail bounds for dimension reduction in l1 using Cauchy random projections

COLT'07 Proceedings of the 20th annual conference on Learning theory
Improving random projections using marginal information

COLT'06 Proceedings of the 19th annual conference on Learning Theory
Compressed sensing

IEEE Transactions on Information Theory
Support vector machines for histogram-based image classification

IEEE Transactions on Neural Networks

Estimators and tail bounds for dimension reduction in lα (0

Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms
Exact sparse recovery with L0 projections

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

The method of stable random projections is a useful tool for efficiently computing the lα (0 A ∈RnxD. If we multiply A with a projection matrix R ΕR Dxk (k« D),whose entries are i.i.d. samples of an α-stable distribution,then the projected matrix B = Ax R Ε R nxkx containsenough information to approximately recover the l α properties in A. We propose very sparse stable random projections, by replacing the α stable distribution with a (much simpler) mixture of a symmetric α Pareto distribution (with probability Β, 0 β Β 1) and a point mass at the origin(with probability 1-Β). This leads to a significant 1 over Β fold speedup for small Β when computing B = AxR and a 1 over Β-fold cost reduction in storing R}. By analyzing the convergence, we show that in"reasonable" datasets Β often can be very small (e.g.,D1/2 without hurting the estimation accuracy. Some numerical evaluations are conducted, on synthetic data, Web crawldata, and gene expression microarray data.