EM algorithms for PCA and SPCA
NIPS '97 Proceedings of the 1997 conference on Advances in neural information processing systems 10
Data structures and algorithms for nearest neighbor search in general metric spaces
SODA '93 Proceedings of the fourth annual ACM-SIAM Symposium on Discrete algorithms
Indexing large metric spaces for similarity search queries
ACM Transactions on Database Systems (TODS)
Multidimensional binary search trees used for associative searching
Communications of the ACM
ACM Computing Surveys (CSUR)
Lectures on Discrete Geometry
R-trees: a dynamic index structure for spatial searching
SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
When Is ''Nearest Neighbor'' Meaningful?
ICDT '99 Proceedings of the 7th International Conference on Database Theory
M-tree: An Efficient Access Method for Similarity Search in Metric Spaces
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Near Neighbor Search in Large Metric Spaces
VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Searching in Metric Spaces by Spatial Approximation
SPIRE '99 Proceedings of the String Processing and Information Retrieval Symposium & International Workshop on Groupware
Pivot selection techniques for proximity searching in metric spaces
Pattern Recognition Letters
Index-driven similarity search in metric spaces (Survey Article)
ACM Transactions on Database Systems (TODS)
Foundations of Multidimensional and Metric Data Structures (The Morgan Kaufmann Series in Computer Graphics and Geometric Modeling)
On Optimizing Distance-Based Similarity Search for Biological Databases
CSB '05 Proceedings of the 2005 IEEE Computational Systems Bioinformatics Conference
A metric model of amino acid substitution
Bioinformatics
High dimensional nearest neighbor searching
Information Systems
Efficient index-based KNN join processing for high-dimensional data
Information and Software Technology
Analyzing Metric Space Indexes: What For?
SISAP '09 Proceedings of the 2009 Second International Workshop on Similarity Search and Applications
When is nearest neighbors indexable?
ICDT'05 Proceedings of the 10th international conference on Database Theory
Flexible and efficient string similarity search with alignment-space transform
Proceedings of the 7th International Conference on Ubiquitous Information Management and Communication
Hi-index | 0.00 |
Distance-based indexing exploits only the triangle inequality to answer similarity queries in metric spaces. Lacking coordinate structure, mathematical tools in R^n can only be applied indirectly, making it difficult to theoretically study metric-space indexing. Toward solving this problem, a common algorithmic step is to select a small number of special points, called pivots, and map the data objects to a low-dimensional space, one dimension for each pivot, where each dimension represents the distances of a pivot to the data objects. We formalize a ''pivot space model'' where all the data objects are used as pivots such that data is mapped from metric space to R^n, preserving all the pairwise distances under L^~. With this model, it can be shown that the indexing problem in metric space can be equivalently studied in R^n. Further, we show the necessity of dimension reduction for R^n and that the only effective form of dimension reduction is to select existing dimensions, i.e. pivot selection. The coordinate structure of R^n makes the application of many mathematical tools possible. In particular, Principle Component Analysis (PCA) is incorporated into a heuristic method for pivot selection and shown to be effective over a large range of workloads. We also show that PCA can be used to reliably measure the intrinsic dimension of a metric space.