The R*-tree: an efficient and robust access method for points and rectangles
SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
The SR-tree: an index structure for high-dimensional nearest neighbor queries
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
On the analysis of indexing schemes
PODS '97 Proceedings of the sixteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
The K-D-B-tree: a search structure for large multidimensional dynamic indexes
SIGMOD '81 Proceedings of the 1981 ACM SIGMOD international conference on Management of data
R-trees: a dynamic index structure for spatial searching
SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
The TV-tree: an index structure for high-dimensional data
The VLDB Journal — The International Journal on Very Large Data Bases - Spatial Database Systems
Similarity Indexing with the SS-tree
ICDE '96 Proceedings of the Twelfth International Conference on Data Engineering
When Is ''Nearest Neighbor'' Meaningful?
ICDT '99 Proceedings of the 7th International Conference on Database Theory
Similarity Search in High Dimensions via Hashing
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
The R+-Tree: A Dynamic Index for Multi-Dimensional Objects
VLDB '87 Proceedings of the 13th International Conference on Very Large Data Bases
The X-tree: An Index Structure for High-Dimensional Data
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Improved query processing and data representation techniques
Improved query processing and data representation techniques
Database support for queries by image content
Database support for queries by image content
Theory of nearest neighbors indexability
ACM Transactions on Database Systems (TODS)
Effectiveness of NAQ-tree as index structure for similarity search in high-dimensional metric space
Knowledge and Information Systems
Understanding the meaning of a shifted sky: a general framework on extending skyline query
The VLDB Journal — The International Journal on Very Large Data Bases
Database implementation of a model-free classifier
ADBIS'07 Proceedings of the 11th East European conference on Advances in databases and information systems
Dimension reduction for distance-based indexing
Proceedings of the Third International Conference on SImilarity Search and APplications
Effective monitoring by efficient fingerprint matching using a forest of NAQ-trees
Journal of Intelligent Information Systems
Pivot selection: Dimension reduction for distance-based indexing
Journal of Discrete Algorithms
Evaluation measures for similarity search results in process model repositories
ER'12 Proceedings of the 31st international conference on Conceptual Modeling
Hi-index | 0.00 |
In this paper, we consider whether traditional index structures are effective in processing unstable nearest neighbors workloads. It is known that under broad conditions, nearest neighbors workloads become unstable–distances between data points become indistinguishable from each other. We complement this earlier result by showing that if the workload for your application is unstable, you are not likely to be able to index it efficiently using (almost all known) multidimensional index structures. For a broad class of data distributions, we prove that these index structures will do no better than a linear scan of the data as dimensionality increases. Our result has implications for how experiments should be designed on index structures such as R-Trees, X-Trees and SR-Trees: Simply put, experiments trying to establish that these index structures scale with dimensionality should be designed to establish cross-over points, rather than to show that the methods scale to an arbitrary number of dimensions. In other words, experiments should seek to establish the dimensionality of the dataset at which the proposed index structure deteriorates to linear scan, for each data distribution of interest; that linear scan will eventually dominate is a given. An important problem is to analytically characterize the rate at which index structures degrade with increasing dimensionality, because the dimensionality of a real data set may well be in the range that a particular method can handle. The results in this paper can be regarded as a step towards solving this problem. Although we do not characterize the rate at which a structure degrades, our techniques allow us to reason directly about a broad class of index structures, rather than the geometry of the nearest neighbors problem, in contrast to earlier work.