Faster dimension reduction

Authors:
Nir Ailon;Bernard Chazelle
Affiliations:
Google Research;Princeton University
Venue:
Communications of the ACM
Year:
2010

Citing 27
Cited 12

The Johnson-Lindenstrauss Lemma and the sphericity of some graphs

Journal of Combinatorial Theory Series A
Approximate closest-point queries in high dimensions

Information Processing Letters
An algorithm for approximate closest-point queries

SCG '94 Proceedings of the tenth annual symposium on Computational geometry
Two algorithms for nearest-neighbor search in high dimensions

STOC '97 Proceedings of the twenty-ninth annual ACM symposium on Theory of computing
Latent semantic indexing: a probabilistic analysis

PODS '98 Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Approximate nearest neighbors: towards removing the curse of dimensionality

STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Polynomial time approximation schemes for Euclidean traveling salesman and other geometric problems

Journal of the ACM (JACM)
An optimal algorithm for approximate nearest neighbor searching fixed dimensions

Journal of the ACM (JACM)
Lower bounds for high dimensional nearest neighbor search and related problems

STOC '99 Proceedings of the thirty-first annual ACM symposium on Theory of computing
Guillotine Subdivisions Approximate Polygonal Subdivisions: A Simple Polynomial-Time Approximation Scheme for Geometric TSP, k-MST, and Related Problems

SIAM Journal on Computing
Approximate nearest neighbor queries in fixed dimensions

SODA '93 Proceedings of the fourth annual ACM-SIAM Symposium on Discrete algorithms
Data structures and algorithms for nearest neighbor search in general metric spaces

SODA '93 Proceedings of the fourth annual ACM-SIAM Symposium on Discrete algorithms
Note on a Lower Bound on the Linear Complexity of the Fast Fourier Transform

Journal of the ACM (JACM)
The Linear Complexity of Computation

Journal of the ACM (JACM)
Clustering for edge-cost minimization (extended abstract)

STOC '00 Proceedings of the thirty-second annual ACM symposium on Theory of computing
Locally lifting the curse of dimensionality for nearest neighbor search (extended abstract)

SODA '00 Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms
Dimensionality reduction techniques for proximity problems

SODA '00 Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms
Random projection in dimensionality reduction: applications to image and text data

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Efficient Search for Approximate Nearest Neighbor in High Dimensional Spaces

SIAM Journal on Computing
Simple and Practical Sequence Nearest Neighbors with Block Operations

CPM '02 Proceedings of the 13th Annual Symposium on Combinatorial Pattern Matching
On Approximate Nearest Neighbors in Non-Euclidean Spaces

FOCS '98 Proceedings of the 39th Annual Symposium on Foundations of Computer Science
Approximate Nearest Neighbor Algorithms for Hausdorff Metrics via Embeddings

FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
Database-friendly random projections: Johnson-Lindenstrauss with binary coins

Journal of Computer and System Sciences - Special issu on PODS 2001
Sampling algorithms for l2 regression and applications

SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
Improved Approximation Algorithms for Large Matrices via Random Projections

FOCS '06 Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science
The Fast Johnson-Lindenstrauss Transform and Approximate Nearest Neighbors

SIAM Journal on Computing
Fast Dimension Reduction Using Rademacher Series on Dual BCH Codes

Discrete & Computational Geometry

Improved recommendations via (more) collaboration

Procceedings of the 13th International Workshop on the Web and Databases
Fast dimension reduction for document classification based on imprecise spectrum analysis

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Enhancing Clustering Quality through Landmark-Based Dimensionality Reduction

ACM Transactions on Knowledge Discovery from Data (TKDD)
Anomaly detection techniques for a web defacement monitoring service

Expert Systems with Applications: An International Journal
An unbiased distance-based outlier detection approach for high-dimensional data

DASFAA'11 Proceedings of the 16th international conference on Database systems for advanced applications - Volume Part I
Optimal bounds for Johnson-Lindenstrauss transforms and streaming problems with sub-constant error

Proceedings of the twenty-second annual ACM-SIAM symposium on Discrete Algorithms
An almost optimal unrestricted fast Johnson-Lindenstrauss transform

Proceedings of the twenty-second annual ACM-SIAM symposium on Discrete Algorithms
Randomized Algorithms for Matrices and Data

Foundations and Trends® in Machine Learning
Fast dimension reduction for document classification based on imprecise spectrum analysis

Information Sciences: an International Journal
Sketching via hashing: from heavy hitters to compressed sensing to sparse fourier transform

Proceedings of the 32nd symposium on Principles of database systems
An Almost Optimal Unrestricted Fast Johnson-Lindenstrauss Transform

ACM Transactions on Algorithms (TALG) - Special Issue on SODA'11
Optimal Bounds for Johnson-Lindenstrauss Transforms and Streaming Problems with Subconstant Error

ACM Transactions on Algorithms (TALG) - Special Issue on SODA'11

Quantified Score

Hi-index	48.22

Visualization

Abstract

Data represented geometrically in high-dimensional vector spaces can be found in many applications. Images and videos, are often represented by assigning a dimension for every pixel (and time). Text documents may be represented in a vector space where each word in the dictionary incurs a dimension. The need to manipulate such data in huge corpora such as the web and to support various query types gives rise to the question of how to represent the data in a lower-dimensional space to allow more space and time efficient computation. Linear mappings are an attractive approach to this problem because the mapped input can be readily fed into popular algorithms that operate on linear spaces (such as principal-component analysis, PCA) while avoiding the curse of dimensionality. The fact that such mappings even exist became known in computer science following seminal work by Johnson and Lindenstrauss in the early 1980s. The underlying technique is often called "random projection." The complexity of the mapping itself, essentially the product of a vector with a dense matrix, did not attract much attention until recently. In 2006, we discovered a way to "sparsify" the matrix via a computational version of Heisenberg's Uncertainty Principle. This led to a significant speedup, which also retained the practical simplicity of the standard Johnson-Lindenstrauss projection. We describe the improvement in this article, together with some of its applications.