The Johnson-Lindenstrauss Lemma and the sphericity of some graphs
Journal of Combinatorial Theory Series A
Approximate closest-point queries in high dimensions
Information Processing Letters
An algorithm for approximate closest-point queries
SCG '94 Proceedings of the tenth annual symposium on Computational geometry
Two algorithms for nearest-neighbor search in high dimensions
STOC '97 Proceedings of the twenty-ninth annual ACM symposium on Theory of computing
Latent semantic indexing: a probabilistic analysis
PODS '98 Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Approximate nearest neighbors: towards removing the curse of dimensionality
STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Polynomial time approximation schemes for Euclidean traveling salesman and other geometric problems
Journal of the ACM (JACM)
An optimal algorithm for approximate nearest neighbor searching fixed dimensions
Journal of the ACM (JACM)
Lower bounds for high dimensional nearest neighbor search and related problems
STOC '99 Proceedings of the thirty-first annual ACM symposium on Theory of computing
Approximate nearest neighbor queries in fixed dimensions
SODA '93 Proceedings of the fourth annual ACM-SIAM Symposium on Discrete algorithms
Data structures and algorithms for nearest neighbor search in general metric spaces
SODA '93 Proceedings of the fourth annual ACM-SIAM Symposium on Discrete algorithms
Note on a Lower Bound on the Linear Complexity of the Fast Fourier Transform
Journal of the ACM (JACM)
The Linear Complexity of Computation
Journal of the ACM (JACM)
Clustering for edge-cost minimization (extended abstract)
STOC '00 Proceedings of the thirty-second annual ACM symposium on Theory of computing
Locally lifting the curse of dimensionality for nearest neighbor search (extended abstract)
SODA '00 Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms
Dimensionality reduction techniques for proximity problems
SODA '00 Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms
Random projection in dimensionality reduction: applications to image and text data
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Efficient Search for Approximate Nearest Neighbor in High Dimensional Spaces
SIAM Journal on Computing
Simple and Practical Sequence Nearest Neighbors with Block Operations
CPM '02 Proceedings of the 13th Annual Symposium on Combinatorial Pattern Matching
On Approximate Nearest Neighbors in Non-Euclidean Spaces
FOCS '98 Proceedings of the 39th Annual Symposium on Foundations of Computer Science
Approximate Nearest Neighbor Algorithms for Hausdorff Metrics via Embeddings
FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
Database-friendly random projections: Johnson-Lindenstrauss with binary coins
Journal of Computer and System Sciences - Special issu on PODS 2001
Sampling algorithms for l2 regression and applications
SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
Improved Approximation Algorithms for Large Matrices via Random Projections
FOCS '06 Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science
The Fast Johnson-Lindenstrauss Transform and Approximate Nearest Neighbors
SIAM Journal on Computing
Fast Dimension Reduction Using Rademacher Series on Dual BCH Codes
Discrete & Computational Geometry
Improved recommendations via (more) collaboration
Procceedings of the 13th International Workshop on the Web and Databases
Fast dimension reduction for document classification based on imprecise spectrum analysis
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Enhancing Clustering Quality through Landmark-Based Dimensionality Reduction
ACM Transactions on Knowledge Discovery from Data (TKDD)
Anomaly detection techniques for a web defacement monitoring service
Expert Systems with Applications: An International Journal
An unbiased distance-based outlier detection approach for high-dimensional data
DASFAA'11 Proceedings of the 16th international conference on Database systems for advanced applications - Volume Part I
Optimal bounds for Johnson-Lindenstrauss transforms and streaming problems with sub-constant error
Proceedings of the twenty-second annual ACM-SIAM symposium on Discrete Algorithms
An almost optimal unrestricted fast Johnson-Lindenstrauss transform
Proceedings of the twenty-second annual ACM-SIAM symposium on Discrete Algorithms
Randomized Algorithms for Matrices and Data
Foundations and Trends® in Machine Learning
Fast dimension reduction for document classification based on imprecise spectrum analysis
Information Sciences: an International Journal
Sketching via hashing: from heavy hitters to compressed sensing to sparse fourier transform
Proceedings of the 32nd symposium on Principles of database systems
An Almost Optimal Unrestricted Fast Johnson-Lindenstrauss Transform
ACM Transactions on Algorithms (TALG) - Special Issue on SODA'11
Optimal Bounds for Johnson-Lindenstrauss Transforms and Streaming Problems with Subconstant Error
ACM Transactions on Algorithms (TALG) - Special Issue on SODA'11
Hi-index | 48.22 |
Data represented geometrically in high-dimensional vector spaces can be found in many applications. Images and videos, are often represented by assigning a dimension for every pixel (and time). Text documents may be represented in a vector space where each word in the dictionary incurs a dimension. The need to manipulate such data in huge corpora such as the web and to support various query types gives rise to the question of how to represent the data in a lower-dimensional space to allow more space and time efficient computation. Linear mappings are an attractive approach to this problem because the mapped input can be readily fed into popular algorithms that operate on linear spaces (such as principal-component analysis, PCA) while avoiding the curse of dimensionality. The fact that such mappings even exist became known in computer science following seminal work by Johnson and Lindenstrauss in the early 1980s. The underlying technique is often called "random projection." The complexity of the mapping itself, essentially the product of a vector with a dense matrix, did not attract much attention until recently. In 2006, we discovered a way to "sparsify" the matrix via a computational version of Heisenberg's Uncertainty Principle. This led to a significant speedup, which also retained the practical simplicity of the standard Johnson-Lindenstrauss projection. We describe the improvement in this article, together with some of its applications.