Searching in metric spaces with user-defined and approximate distances

Authors:
Paolo Ciaccia;Marco Patella
Affiliations:
DEIS---CSITE--CNR, University of Bologna, Bologna, Italy;DEIS---CSITE--CNR, University of Bologna, Bologna, Italy
Venue:
ACM Transactions on Database Systems (TODS)
Year:
2002

Citing 49
Cited 34

The R*-tree: an efficient and robust access method for points and rectangles

SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
Elements of information theory

Elements of information theory
Efficient and effective querying by image content

Journal of Intelligent Information Systems - Special issue: advances in visual information management systems
FastMap: a fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Texture Features for Browsing and Retrieval of Image Data

IEEE Transactions on Pattern Analysis and Machine Intelligence
Combining fuzzy information from multiple systems (extended abstract)

PODS '96 Proceedings of the fifteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
The SR-tree: an index structure for high-dimensional nearest neighbor queries

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Advanced database systems

Advanced database systems
A cost model for nearest neighbor search in high-dimensional data space

PODS '97 Proceedings of the sixteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
A cost model for similarity queries in metric spaces

PODS '98 Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Optimal multi-step k-nearest neighbor search

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Multidimensional access methods

ACM Computing Surveys (CSUR)
Data structures and algorithms for nearest neighbor search in general metric spaces

SODA '93 Proceedings of the fourth annual ACM-SIAM Symposium on Discrete algorithms
Distance browsing in spatial databases

ACM Transactions on Database Systems (TODS)
The String-to-String Correction Problem

Journal of the ACM (JACM)
Indexing large metric spaces for similarity search queries

ACM Transactions on Database Systems (TODS)
PREFER: a system for the efficient execution of multi-parametric ranked queries

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Searching in metric spaces

ACM Computing Surveys (CSUR)
R-trees: a dynamic index structure for spatial searching

SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
Fast and Effective Retrieval of Medical Tumor Shapes

IEEE Transactions on Knowledge and Data Engineering
Supporting Ranked Boolean Similarity Queries in MARS

IEEE Transactions on Knowledge and Data Engineering
A Multistep Approach for Shape Similarity Search in Image Databases

IEEE Transactions on Knowledge and Data Engineering
Efficient Color Histogram Indexing for Quadratic Form Distance Functions

IEEE Transactions on Pattern Analysis and Machine Intelligence
Processing Complex Similarity Queries with Distance-Based Access Methods

EDBT '98 Proceedings of the 6th International Conference on Extending Database Technology: Advances in Database Technology
An Approach to Integrating Query Refinement in SQL

EDBT '02 Proceedings of the 8th International Conference on Extending Database Technology: Advances in Database Technology
Querying with Intrinsic Preferences

EDBT '02 Proceedings of the 8th International Conference on Extending Database Technology: Advances in Database Technology
Efficient Similarity Search In Sequence Databases

FODO '93 Proceedings of the 4th International Conference on Foundations of Data Organization and Algorithms
Similarity Search without Tears: The OMNI Family of All-purpose Access Methods

Proceedings of the 17th International Conference on Data Engineering
The Skyline Operator

Proceedings of the 17th International Conference on Data Engineering
M-tree: An Efficient Access Method for Similarity Search in Metric Spaces

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Improving Adaptable Similarity Query Processing by Using Approximations

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
MindReader: Querying Databases Through Multiple Examples

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Fast Time Sequence Indexing for Arbitrary Lp Norms

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Contrast Plots and P-Sphere Trees: Space vs. Time in Nearest Neighbour Searches

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Similarity Search for Adaptive Ellipsoid Queries Using Spatial Transformation

Proceedings of the 27th International Conference on Very Large Data Bases
FeedbackBypass: A New Approach to Interactive Similarity Query Processing

Proceedings of the 27th International Conference on Very Large Data Bases
Efficient Index Structures for String Databases

Proceedings of the 27th International Conference on Very Large Data Bases
Reading a Set of Disk Pages

VLDB '93 Proceedings of the 19th International Conference on Very Large Data Bases
Content-Based Image Indexing

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
The Impact of Global Clustering on Spatial Database Systems

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Near Neighbor Search in Large Metric Spaces

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
The X-tree: An Index Structure for High-Dimensional Data

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Efficient User-Adaptable Similarity Search in Large Multimedia Databases

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
The Hybrid Tree: An Index Structure for High Dimensional Feature Spaces

ICDE '99 Proceedings of the 15th International Conference on Data Engineering
Self-Adaptive User Profiles for Large-Scale Data Delivery

ICDE '00 Proceedings of the 16th International Conference on Data Engineering
Distance Exponent: A New Concept for Selectivity Estimation in Metric Trees

ICDE '00 Proceedings of the 16th International Conference on Data Engineering
A Metric for Distributions with Applications to Image Databases

ICCV '98 Proceedings of the Sixth International Conference on Computer Vision
Preference SQL: design, implementation, experiences

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Retrieval by shape similarity with perceptual distance andeffective indexing

IEEE Transactions on Multimedia

Index-driven similarity search in metric spaces (Survey Article)

ACM Transactions on Database Systems (TODS)
WARP: Accurate Retrieval of Shapes Using Phase of Fourier Descriptors and Time Warping Distance

IEEE Transactions on Pattern Analysis and Machine Intelligence
KLEE: a framework for distributed top-k query algorithms

VLDB '05 Proceedings of the 31st international conference on Very large data bases
A multi-step strategy for approximate similarity search in image databases

ADC '06 Proceedings of the 17th Australasian Database Conference - Volume 49
Reverse Nearest Neighbor Search in Metric Spaces

IEEE Transactions on Knowledge and Data Engineering
Using high dimensional indexes to support relevance feedback based interactive images retrieval

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Dynamic similarity search in multi-metric spaces

MIR '06 Proceedings of the 8th ACM international workshop on Multimedia information retrieval
An efficient k nearest neighbor search for multivariate time series

Information and Computation
Warping the time on data streams

Data & Knowledge Engineering
Indexing schemes for similarity search in datasets of short protein fragments

Information Systems
Unified framework for fast exact and approximate search in dissimilarity spaces

ACM Transactions on Database Systems (TODS)
Top-k query evaluation with probabilistic guarantees

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Best position algorithms for top-k queries

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Dynamic skyline queries in metric spaces

EDBT '08 Proceedings of the 11th international conference on Extending database technology: Advances in database technology
Dynamic user-defined similarity searching in semi-structured text retrieval

Proceedings of the 3rd international conference on Scalable information systems
The Panda framework for comparing patterns

Data & Knowledge Engineering
Seamlessly integrating similarity queries in SQL

Software—Practice & Experience
Supporting exact indexing of arbitrarily rotated shapes and periodic time series under Euclidean and warping distance measures

The VLDB Journal — The International Journal on Very Large Data Bases
Speeding up spatial approximation search in metric spaces

Journal of Experimental Algorithmics (JEA)
Flexible multi-dimensional indexing server for searching non-textual diagnostic annotations

EuroIMSA '08 Proceedings of the IASTED International Conference on Internet and Multimedia Systems and Applications
Improving the performance of M-tree family by nearest-neighbor graphs

ADBIS'07 Proceedings of the 11th East European conference on Advances in databases and information systems
Effectiveness of optimal incremental multi-step nearest neighbor search

Expert Systems with Applications: An International Journal
Cover ratio of absolute neighbor: towards an index structure for efficient retrieval

WALCOM'08 Proceedings of the 2nd international conference on Algorithms and computation
CP-index: using clustering and pivots for indexing non-metric spaces

Proceedings of the Third International Conference on SImilarity Search and APplications
Subspace tree: high dimensional multimedia indexing with logarithmic temporal complexity

Journal of Intelligent Information Systems
Metric information filtering

Information Systems
On nonmetric similarity search problems in complex domains

ACM Computing Surveys (CSUR)
Best position algorithms for efficient top-k query processing

Information Systems
On the least cost for proximity searching in metric spaces

WEA'06 Proceedings of the 5th international conference on Experimental Algorithms
On fast non-metric similarity search by metric access methods

EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology
SC-tree: an efficient structure for high-dimensional data indexing

BNCOD'06 Proceedings of the 23rd British National Conference on Databases, conference on Flexible and Efficient Information Handling
Self-organising hierarchical retrieval in a case-agent system

ECCBR'06 Proceedings of the 8th European conference on Advances in Case-Based Reasoning
Processing preference queries in standard database systems

ADVIS'06 Proceedings of the 4th international conference on Advances in Information Systems
Adapting metric indexes for searching in multi-metric spaces

Multimedia Tools and Applications

Quantified Score

Hi-index	0.01

Visualization

Abstract

Novel database applications, such as multimedia, data mining, e-commerce, and many others, make intensive use of similarity queries in order to retrieve the objects that better fit a user request. Since the effectiveness of such queries improves when the user is allowed to personalize the similarity criterion according to which database objects are evaluated and ranked, the development of access methods able to efficiently support user-defined similarity queries becomes a basic requirement. In this article we introduce the first index structure, called the QIC-M-tree, that can process user-defined queries in generic metric spaces, that is, where the only information about indexed objects is their relative distances. The QIC-M-tree is a metric access method that can deal with several distinct distances at a time: (1) a query (user-defined) distance, (2) an index distance (used to build the tree), and (3) a comparison (approximate) distance (used to quickly discard from the search uninteresting parts of the tree). We develop an analytical cost model that accurately characterizes the performance of the QIC-M-tree and validate such model through extensive experimentation on real metric data sets. In particular, our analysis is able to predict the best evaluation strategy (i.e., which distances to use) under a variety of configurations, by properly taking into account relevant factors such as the distribution of distances, the cost of computing distances, and the actual index structure. We also prove that the overall saving in CPU search costs when using an approximate distance can be estimated by using information on the data set only (thus such measure is independent of the underlying access method) and show that performance results are closely related to a novel "indexing" error measure.