The Johnson-Lindenstrauss Lemma and the sphericity of some graphs
Journal of Combinatorial Theory Series A
Term-weighting approaches in automatic text retrieval
Information Processing and Management: an International Journal
On the self-similar nature of Ethernet traffic (extended version)
IEEE/ACM Transactions on Networking (TON)
The space complexity of approximating the frequency moments
STOC '96 Proceedings of the twenty-eighth annual ACM symposium on Theory of computing
Self-similarity in World Wide Web traffic: evidence and possible causes
IEEE/ACM Transactions on Networking (TON)
Latent semantic indexing: a probabilistic analysis
PODS '98 Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Approximate nearest neighbors: towards removing the curse of dimensionality
STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
A theory of term weighting based on exploratory data analysis
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Foundations of statistical natural language processing
Foundations of statistical natural language processing
Term Weighting in Information Retrieval Using the Term Precision Model
Journal of the ACM (JACM)
Database-friendly random projections
PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Discovering unexpected information from your competitors' web sites
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Random projection in dimensionality reduction: applications to image and text data
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Similarity estimation techniques from rounding algorithms
STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
Models and issues in data stream systems
Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
An elementary proof of a theorem of Johnson and Lindenstrauss
Random Structures & Algorithms
Comparing Data Streams Using Hamming Norms (How to Zero In)
IEEE Transactions on Knowledge and Data Engineering
Random Projection: A New Approach to VLSI Layout
FOCS '98 Proceedings of the 39th Annual Symposium on Foundations of Computer Science
An Algorithmic Theory of Learning: Robust Concepts and Random Projection
FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
An Approximate L1-Difference Algorithm for Massive Data Streams
FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
Stable distributions, pseudorandom generators, embeddings and data stream computation
FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
On the Resemblance and Containment of Documents
SEQUENCES '97 Proceedings of the Compression and Complexity of Sequences 1997
Database-friendly random projections: Johnson-Lindenstrauss with binary coins
Journal of Computer and System Sciences - Special issu on PODS 2001
Algorithmic Applications of Low-Distortion Geometric Embeddings
FOCS '01 Proceedings of the 42nd IEEE symposium on Foundations of Computer Science
On the Impossibility of Dimension Reduction in \ell _1
FOCS '03 Proceedings of the 44th Annual IEEE Symposium on Foundations of Computer Science
Experiments with random projections for machine learning
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Retrieval of difficult image classes using svd-based relevance feedback
Proceedings of the 6th ACM SIGMM international workshop on Multimedia information retrieval
WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
On the impossibility of dimension reduction in l1
Journal of the ACM (JACM)
IEEE Transactions on Knowledge and Data Engineering
Approximate nearest neighbors and the fast Johnson-Lindenstrauss transform
Proceedings of the thirty-eighth annual ACM symposium on Theory of computing
Stable distributions, pseudorandom generators, embeddings, and data stream computation
Journal of the ACM (JACM)
Very sparse random projections
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Data Streams: Models and Algorithms (Advances in Database Systems)
Data Streams: Models and Algorithms (Advances in Database Systems)
Randomized algorithms and NLP: using locality sensitive hash function for high speed noun clustering
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Using sketches to estimate associations
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Comparing data streams using Hamming norms (how to zero in)
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
A Sketch Algorithm for Estimating Two-Way and Multi-Way Associations
Computational Linguistics
Nonlinear Estimators and Tail Bounds for Dimension Reduction in l1 Using Cauchy Random Projections
The Journal of Machine Learning Research
Nonlinear estimators and tail bounds for dimension reduction in l1 using Cauchy random projections
COLT'07 Proceedings of the 20th annual conference on Learning theory
Improving random projections using marginal information
COLT'06 Proceedings of the 19th annual conference on Learning Theory
IEEE Transactions on Information Theory
Support vector machines for histogram-based image classification
IEEE Transactions on Neural Networks
Estimators and tail bounds for dimension reduction in lα (0
Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms
Exact sparse recovery with L0 projections
Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Hi-index | 0.00 |
The method of stable random projections is a useful tool for efficiently computing the lα (0 A ∈RnxD. If we multiply A with a projection matrix R ΕR Dxk (k« D),whose entries are i.i.d. samples of an α-stable distribution,then the projected matrix B = Ax R Ε R nxkx containsenough information to approximately recover the l α properties in A. We propose very sparse stable random projections, by replacing the α stable distribution with a (much simpler) mixture of a symmetric α Pareto distribution (with probability Β, 0 β Β 1) and a point mass at the origin(with probability 1-Β). This leads to a significant 1 over Β fold speedup for small Β when computing B = AxR and a 1 over Β-fold cost reduction in storing R}. By analyzing the convergence, we show that in"reasonable" datasets Β often can be very small (e.g.,D1/2 without hurting the estimation accuracy. Some numerical evaluations are conducted, on synthetic data, Web crawldata, and gene expression microarray data.