Probabilistic proximity searching algorithms based on compact partitions

Authors:
Benjamin Bustos;Gonzalo Navarro
Affiliations:
Department of Computer and Information Science, University of Konstanz, Universitaetstr. 10, 78457 Konstanz, Germany;Center for Web Research, Department of Computer Science, University of Chile, Blanco Encalada 2120, Santiago, Chile
Venue:
Journal of Discrete Algorithms - SPIRE 2002
Year:
2004

Citing 17
Cited 9

Vorono trees and clustering problems

Information Systems
Overview of the second text retrieval conference (TREC-2)

TREC-2 Proceedings of the second conference on Text retrieval conference
Approximate range searching

Proceedings of the eleventh annual symposium on Computational geometry
An optimal algorithm for approximate nearest neighbor searching

SODA '94 Proceedings of the fifth annual ACM-SIAM symposium on Discrete algorithms
Locally lifting the curse of dimensionality for nearest neighbor search (extended abstract)

SODA '00 Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms
Searching in metric spaces

ACM Computing Surveys (CSUR)
Modern Information Retrieval

Modern Information Retrieval
Probabilistic proximity search: fighting the curse of dimensionality in metric spaces

Information Processing Letters
M-tree: An Efficient Access Method for Similarity Search in Metric Spaces

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Contrast Plots and P-Sphere Trees: Space vs. Time in Nearest Neighbour Searches

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Near Neighbor Search in Large Metric Spaces

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Monotonous Bisector* Trees - A Tool for Efficient Partitioning of Complex Scenes of Geometric Objects

Data Structures and Efficient Algorithms, Final Report on the DFG Special Joint Initiative
Approximate similarity retrieval with M-trees

The VLDB Journal — The International Journal on Very Large Data Bases
Searching in metric spaces by spatial approximation

The VLDB Journal — The International Journal on Very Large Data Bases
An Effective Clustering Algorithm to Index High Dimensional Metric Spaces

SPIRE '00 Proceedings of the Seventh International Symposium on String Processing Information Retrieval (SPIRE'00)
PAC Nearest Neighbor Queries: Approximate and Controlled Search in High-Dimensional and Metric Spaces

ICDE '00 Proceedings of the 16th International Conference on Data Engineering
Pivot selection techniques for proximity searching in metric spaces

Pattern Recognition Letters

Genetic algorithms for approximate similarity queries

Data & Knowledge Engineering
Unified framework for fast exact and approximate search in dissimilarity spaces

ACM Transactions on Database Systems (TODS)
t-Spanners for metric space searching

Data & Knowledge Engineering
Approximate similarity search: A multi-faceted problem

Journal of Discrete Algorithms
Speeding up spatial approximation search in metric spaces

Journal of Experimental Algorithmics (JEA)
Analyzing Metric Space Indexes: What For?

SISAP '09 Proceedings of the 2009 Second International Workshop on Similarity Search and Applications
Fast k most similar neighbor classifier for mixed data based on approximating and eliminating

PAKDD'08 Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining
On the least cost for proximity searching in metric spaces

WEA'06 Proceedings of the 5th international conference on Experimental Algorithms
Proximity searching in high dimensional spaces with a proximity preserving order

MICAI'05 Proceedings of the 4th Mexican international conference on Advances in Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

The main bottleneck of the research in metric space searching is the so-called curse of dimensionality, which makes the task of searching some metric spaces intrinsically difficult, whatever algorithm is used. A recent trend to break this bottleneck resorts to probabilistic algorithms, where it has been shown that one can find 99% of the relevant objects at a fraction of the cost of the exact algorithm. These algorithms are welcome in most applications because resorting to metric space searching already involves a fuzziness in the retrieval requirements. In this paper, we push further in this direction by developing probabilistic algorithms on data structures whose exact versions are the best for high dimensions. As a result, we obtain probabilistic algorithms that are better than the previous ones. We give new insights on the problem and propose a novel view based on time-bounded searching. We also propose an experimental framework for probabilistic algorithms that permits comparing them in offline mode.