EM algorithms for PCA and SPCA
NIPS '97 Proceedings of the 1997 conference on Advances in neural information processing systems 10
Data structures and algorithms for nearest neighbor search in general metric spaces
SODA '93 Proceedings of the fourth annual ACM-SIAM Symposium on Discrete algorithms
Indexing large metric spaces for similarity search queries
ACM Transactions on Database Systems (TODS)
Multidimensional binary search trees used for associative searching
Communications of the ACM
ACM Computing Surveys (CSUR)
Lectures on Discrete Geometry
R-trees: a dynamic index structure for spatial searching
SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
When Is ''Nearest Neighbor'' Meaningful?
ICDT '99 Proceedings of the 7th International Conference on Database Theory
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Near Neighbor Search in Large Metric Spaces
VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Searching in Metric Spaces by Spatial Approximation
SPIRE '99 Proceedings of the String Processing and Information Retrieval Symposium & International Workshop on Groupware
Pivot selection techniques for proximity searching in metric spaces
Pattern Recognition Letters
Index-driven similarity search in metric spaces (Survey Article)
ACM Transactions on Database Systems (TODS)
Foundations of Multidimensional and Metric Data Structures (The Morgan Kaufmann Series in Computer Graphics and Geometric Modeling)
On Optimizing Distance-Based Similarity Search for Biological Databases
CSB '05 Proceedings of the 2005 IEEE Computational Systems Bioinformatics Conference
A metric model of amino acid substitution
Bioinformatics
When is nearest neighbors indexable?
ICDT'05 Proceedings of the 10th international conference on Database Theory
Hi-index | 0.00 |
Distance-based indexing exploits only the triangle inequality to answer similarity queries in metric spaces. Lacking of coordinate structure, mathematical tools in Rn can only be applied indirectly, making it difficult for theoretical study in metric space indexing. Toward solving this problem, we formalize a "pivot space model" where data is mapped from metric space to Rn, preserving all the pair wise distances under Linfin;. With this model, it can be shown that the indexing problem in metric space can be equivalently studied in Rn. Further, we show the necessity of dimension reduction for Rn and that the only effective form of dimension reduction is to select existing dimensions, i.e. pivot selection. The coordinate structure of Rn makes the application of many mathematical tools possible. In particular, Principle Component Analysis (PCA) is incorporated into a heuristic method for pivot selection and shown to be effective over a large range of workloads. We also show that PCA can be used to reliably measure the intrinsic dimension of a metric-space.