On Index-Free Similarity Search in Metric Spaces

Authors:
Tomáš Skopal;Benjamin Bustos
Affiliations:
Department of Software Engineering, FMP, Charles University in Prague, Prague, Czech Republic 118 00;Department of Computer Science, University of Chile, Santiago, Chile
Venue:
DEXA '09 Proceedings of the 20th International Conference on Database and Expert Systems Applications
Year:
2009

Citing 14
Cited 3

Principles of database buffer management

ACM Transactions on Database Systems (TODS)
A system for adaptive disk rearrangement

Software—Practice & Experience
External memory algorithms and data structures: dealing with massive data

ACM Computing Surveys (CSUR)
Searching in metric spaces

ACM Computing Surveys (CSUR)
Searching in high-dimensional spaces: Index structures for improving the performance of multimedia databases

ACM Computing Surveys (CSUR)
M-tree: An Efficient Access Method for Similarity Search in Metric Spaces

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Near Neighbor Search in Large Metric Spaces

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Index-driven similarity search in metric spaces (Survey Article)

ACM Transactions on Database Systems (TODS)
Foundations of Multidimensional and Metric Data Structures (The Morgan Kaufmann Series in Computer Graphics and Geometric Modeling)

Foundations of Multidimensional and Metric Data Structures (The Morgan Kaufmann Series in Computer Graphics and Geometric Modeling)
Similarity Search: The Metric Space Approach (Advances in Database Systems)

Similarity Search: The Metric Space Approach (Advances in Database Systems)
A metric cache for similarity search

Proceedings of the 2008 ACM workshop on Large-Scale distributed systems for information retrieval
Caching content-based queries for robust and efficient image retrieval

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Nearest neighbours search using the PM-Tree

DASFAA'05 Proceedings of the 10th international conference on Database Systems for Advanced Applications

An efficient algorithm for reverse furthest neighbors query with metric index

DEXA'10 Proceedings of the 21st international conference on Database and expert systems applications: Part II
On nonmetric similarity search problems in complex domains

ACM Computing Surveys (CSUR)
Similarity caching in large-scale image retrieval

Information Processing and Management: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

Metric access methods (MAMs) serve as a tool for speeding similarity queries. However, all MAMs developed so far are index-based; they need to build an index on a given database. The indexing itself is either static (the whole database is indexed at once) or dynamic (insertions/deletions are supported), but there is always a preprocessing step needed. In this paper, we propose D-file , the first MAM that requires no indexing at all. This feature is especially beneficial in domains like data mining, streaming databases, etc., where the production of data is much more intensive than querying. Thus, in such environments the indexing is the bottleneck of the entire production/querying scheme. The idea of D-file is an extension of the trivial sequential file (an abstraction over the original database, actually) by so-called D-cache . The D-cache is a main-memory structure that keeps track of distance computations spent by processing all similarity queries so far (within a runtime session). Based on the distances stored in D-cache, the D-file can cheaply determine lower bounds of some distances while the distances alone have not to be explicitly computed, which results in faster queries. Our experimental evaluation shows that query efficiency of D-file is comparable to the index-based state-of-the-art MAMs, however, for zero indexing costs.