Theory of nearest neighbors indexability

Authors:
Uri Shaft;Raghu Ramakrishnan
Affiliations:
Oracle USA, Redwood Shores, CA;University of Wisconsin-Madison, Madison, WI
Venue:
ACM Transactions on Database Systems (TODS)
Year:
2006

Citing 15
Cited 9

The R*-tree: an efficient and robust access method for points and rectangles

SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
The SR-tree: an index structure for high-dimensional nearest neighbor queries

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
On the analysis of indexing schemes

PODS '97 Proceedings of the sixteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
On the geometry of similarity search: dimensionality curse and concentration of measure

Information Processing Letters
The K-D-B-tree: a search structure for large multidimensional dynamic indexes

SIGMOD '81 Proceedings of the 1981 ACM SIGMOD international conference on Management of data
R-trees: a dynamic index structure for spatial searching

SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
The TV-tree: an index structure for high-dimensional data

The VLDB Journal — The International Journal on Very Large Data Bases - Spatial Database Systems
Similarity Indexing with the SS-tree

ICDE '96 Proceedings of the Twelfth International Conference on Data Engineering
A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Similarity Search in High Dimensions via Hashing

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
The R+-Tree: A Dynamic Index for Multi-Dimensional Objects

VLDB '87 Proceedings of the 13th International Conference on Very Large Data Bases
Contrast Plots and P-Sphere Trees: Space vs. Time in Nearest Neighbour Searches

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
The X-tree: An Index Structure for High-Dimensional Data

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Improved query processing and data representation techniques

Improved query processing and data representation techniques
When is nearest neighbors indexable?

ICDT'05 Proceedings of the 10th international conference on Database Theory

New instability results for high-dimensional nearest neighbor search

Information Processing Letters
A Fast Feature-Based Method to Detect Unusual Patterns in Multidimensional Datasets

DaWaK '09 Proceedings of the 11th International Conference on Data Warehousing and Knowledge Discovery
Curse of Dimensionality in Pivot Based Indexes

SISAP '09 Proceedings of the 2009 Second International Workshop on Similarity Search and Applications
Indexability, concentration, and VC theory

Proceedings of the Third International Conference on SImilarity Search and APplications
NV-Tree: nearest neighbors at the billion scale

Proceedings of the 1st ACM International Conference on Multimedia Retrieval
Lower bounds on performance of metric tree indexing schemes for exact similarity search in high dimensions

Proceedings of the Fourth International Conference on SImilarity Search and APplications
Indexability, concentration, and VC theory

Journal of Discrete Algorithms
Impact of storage technology on the efficiency of cluster-based high-dimensional index creation

DASFAA'12 Proceedings of the 17th international conference on Database Systems for Advanced Applications
A survey on unsupervised outlier detection in high-dimensional numerical data

Statistical Analysis and Data Mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this article, we consider whether traditional index structures are effective in processing unstable nearest neighbors workloads. It is known that under broad conditions, nearest neighbors workloads become unstable---distances between data points become indistinguishable from each other. We complement this earlier result by showing that if the workload for an application is unstable, you are not likely to be able to index it efficiently using (almost all known) multidimensional index structures. For a broad class of data distributions, we prove that these index structures will do no better than a linear scan of the data as dimensionality increases.Our result has implications for how experiments should be designed on index structures such as R-Trees, X-Trees, and SR-Trees: simply put, experiments trying to establish that these index structures scale with dimensionality should be designed to establish crossover points, rather than to show that the methods scale to an arbitrary number of dimensions. In other words, experiments should seek to establish the dimensionality of the dataset at which the proposed index structure deteriorates to linear scan, for each data distribution of interest; that linear scan will eventually dominate is a given.An important problem is to analytically characterize the rate at which index structures degrade with increasing dimensionality, because the dimensionality of a real data set may well be in the range that a particular method can handle. The results in this article can be regarded as a step toward solving this problem. Although we do not characterize the rate at which a structure degrades, our techniques allow us to reason directly about a broad class of index structures rather than the geometry of the nearest neighbors problem, in contrast to earlier work.