On Optimizing Nearest Neighbor Queries in High-Dimensional Data Spaces

Authors:
Stefan Berchtold;Christian Böhm;Daniel A. Keim;Florian Krebs;Hans-Peter Kriegel
Affiliations:
-;-;-;-;-
Venue:
ICDT '01 Proceedings of the 8th International Conference on Database Theory
Year:
2001

Citing 20
Cited 1

The R*-tree: an efficient and robust access method for points and rectangles

SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
A retrieval technique for similar shapes

SIGMOD '91 Proceedings of the 1991 ACM SIGMOD international conference on Management of data
Molecular docking using shape descriptors

Journal of Computational Chemistry
Techniques for automatically correcting words in text

ACM Computing Surveys (CSUR)
Efficient and effective querying by image content

Journal of Intelligent Information Systems - Special issue: advances in visual information management systems
Nearest neighbor queries

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Fast parallel similarity search in multimedia databases

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
The SR-tree: an index structure for high-dimensional nearest neighbor queries

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
A cost model for nearest neighbor search in high-dimensional data space

PODS '97 Proceedings of the sixteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Feature-index-based similar shape retrieval

Proceedings of the third IFIP WG2.6 working conference on Visual database systems 3 (VDB-3)
A cost model for similarity queries in metric spaces

PODS '98 Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
High-dimensional index structures database support for next decade's applications (tutorial)

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
An Algorithm for Finding Best Matches in Logarithmic Expected Time

ACM Transactions on Mathematical Software (TOMS)
Analysis of an Algorithm for Finding Nearest Neighbors in Euclidean Space

ACM Transactions on Mathematical Software (TOMS)
The TV-tree: an index structure for high-dimensional data

The VLDB Journal — The International Journal on Very Large Data Bases - Spatial Database Systems
Feature-Based Retrieval of Similar Shapes

Proceedings of the Ninth International Conference on Data Engineering
Similarity Indexing with the SS-tree

ICDE '96 Proceedings of the Twelfth International Conference on Data Engineering
A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
The X-tree: An Index Structure for High-Dimensional Data

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Ranking in Spatial Databases

SSD '95 Proceedings of the 4th International Symposium on Advances in Spatial Databases

MedFMI-SiR: a powerful DBMS solution for large-scale medical image retrieval

ITBAM'11 Proceedings of the Second international conference on Information technology in bio- and medical informatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Nearest-neighbor queries in high-dimensional space are of high importance in various applications, especially in content-based indexing of multimedia data. For an optimization of the query processing, accurate models for estimating the query processing costs are needed. In this paper, we propose a new cost model for nearest neighbor queries in high-dimensional space, which we apply to enhance the performance of high-dimensional index structures. The model is based on new insights into effects occurring in high-dimensional space and provides a closed formula for the processing costs of nearest neighbor queries depending on the dimensionality, the block size and the database size. From the wide range of possible applications of our model, we select two interesting samples: First, we use the model to prove the known linear complexity of the nearest neighbor search problem in high-dimensional space, and second, we provide a technique for optimizing the block size. For data of medium dimensionality, the optimized block size allows significant speed-ups of the query processing time when compared to traditional block sizes and to the linear scan.