Approximate and probabilistic methods

Authors:
Paolo Ciaccia;Marco Patella
Affiliations:
Università di Bologna, Italy;Università di Bologna, Italy
Venue:
SIGSPATIAL Special
Year:
2010

Citing 12
Cited 0

Density-based indexing for approximate nearest-neighbor queries

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
When Is ''Nearest Neighbor'' Meaningful?

ICDT '99 Proceedings of the 7th International Conference on Database Theory
A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Similarity Search in High Dimensions via Hashing

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
What Is the Nearest Neighbor in High Dimensional Spaces?

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
PAC Nearest Neighbor Queries: Approximate and Controlled Search in High-Dimensional and Metric Spaces

ICDE '00 Proceedings of the 16th International Conference on Data Engineering
Foundations of Multidimensional and Metric Data Structures (The Morgan Kaufmann Series in Computer Graphics and Geometric Modeling)

Foundations of Multidimensional and Metric Data Structures (The Morgan Kaufmann Series in Computer Graphics and Geometric Modeling)
Similarity Search: The Metric Space Approach (Advances in Database Systems)

Similarity Search: The Metric Space Approach (Advances in Database Systems)
The Many Facets of Approximate Similarity Search

SISAP '08 Proceedings of the First International Workshop on Similarity Search and Applications (sisap 2008)
Effective Proximity Retrieval by Ordering Permutations

IEEE Transactions on Pattern Analysis and Machine Intelligence
Approximate similarity search: A multi-faceted problem

Journal of Discrete Algorithms
Proceedings of the 2009 Second International Workshop on Similarity Search and Applications

SISAP '09 Proceedings of the 2009 Second International Workshop on Similarity Search and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Why? The metric search paradigm has been to this day successfully applied to several real-world problems, ranging from multimedia to data mining, from decision support to pattern recognition, to statistical and medical applications. Indeed, its simplicity makes it a perfect candidate for solving a variety of similarity problems arising in applications [4, 11]. The casual reader may wonder what prevents the metric space paradigm to become ubiquitously applicable to the ever-increasing range of applications that can benefit from it. The answer to this question is so dreadful that researchers have given it the hideous name of "curse of dimensionality" (an entry in this issue of the bulletin is devoted to this concept). In its essence, the curse of dimensionality says that, whenever the (intrinsic) dimensionality D of the metric space is high, an efficient solution to NN (nearest neighbor) queries. is impossible, and only a sequential scan of the whole dataset could guarantee that the correct result is found. This behavior is basically due to the fact that the variance of the distances to the query object q vanishes with increasing values of D, so that all data objects have almost the same distance to q. In such scenarios, one may however argue that NN queries lose of significance, since any data object would have a distance to the query object comparable to the minimal one [2]. On the other hand, in several real-world cases searching for the exact NN is still difficult, yet the distribution of distances exhibits a sufficiently high variance to make the problem worth solving. In such cases, it is also observed that locating the NN of a query point is, in itself, a relatively easy task, whose complexity indeed decreases with space dimensionality. As a matter of fact, the hard problem in high-D exact NN search is to determine how to stop, i.e., how to guarantee that the current result is the correct one. From this it follows that most of the time spent in an (exact) NN search is wasted time, during which little (or no) improvement is obtained [5].