Effectiveness of optimal incremental multi-step nearest neighbor search

Authors:
Ming Zhang;Reda Alhajj;Jon Rokne
Affiliations:
Department of Computer Science, University of Calgary, Calgary, Alberta, Canada;Department of Computer Science, University of Calgary, Calgary, Alberta, Canada and Department of Computer Science, Global University, Beirut, Lebanon;Department of Computer Science, University of Calgary, Calgary, Alberta, Canada
Venue:
Expert Systems with Applications: An International Journal
Year:
2010

Citing 26
Cited 0

Distance-based indexing for high-dimensional metric spaces

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
The SR-tree: an index structure for high-dimensional nearest neighbor queries

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Optimal multi-step k-nearest neighbor search

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Data structures and algorithms for nearest neighbor search in general metric spaces

SODA '93 Proceedings of the fourth annual ACM-SIAM Symposium on Discrete algorithms
The Grid File: An Adaptable, Symmetric Multikey File Structure

ACM Transactions on Database Systems (TODS)
Distance browsing in spatial databases

ACM Transactions on Database Systems (TODS)
Multidimensional binary search trees used for associative searching

Communications of the ACM
The Earth Mover's Distance as a Metric for Image Retrieval

International Journal of Computer Vision
The K-D-B-tree: a search structure for large multidimensional dynamic indexes

SIGMOD '81 Proceedings of the 1981 ACM SIGMOD international conference on Management of data
Searching in metric spaces with user-defined and approximate distances

ACM Transactions on Database Systems (TODS)
Efficient Retrieval of Similar Time Sequences Under Time Warping

ICDE '98 Proceedings of the Fourteenth International Conference on Data Engineering
Similarity Indexing with the SS-tree

ICDE '96 Proceedings of the Twelfth International Conference on Data Engineering
Similarity Search without Tears: The OMNI Family of All-purpose Access Methods

Proceedings of the 17th International Conference on Data Engineering
M-tree: An Efficient Access Method for Similarity Search in Metric Spaces

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Fast Nearest Neighbor Search in Medical Image Databases

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
The X-tree: An Index Structure for High-Dimensional Data

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
An Index-Based Approach for Similarity Search Supporting Time Warping in Large Sequence Databases

Proceedings of the 17th International Conference on Data Engineering
A Metric for Distributions with Applications to Image Databases

ICCV '98 Proceedings of the Sixth International Conference on Computer Vision
Distinctive Image Features from Scale-Invariant Keypoints

International Journal of Computer Vision
Indexing High-Dimensional Data for Efficient In-Memory Similarity Search

IEEE Transactions on Knowledge and Data Engineering
iDistance: An adaptive B+-tree based indexing method for nearest neighbor search

ACM Transactions on Database Systems (TODS)
Approximation Techniques for Indexing the Earth Mover's Distance in Multimedia Databases

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Reference-based indexing of sequence databases

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Earth mover distance over high-dimensional spaces

Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms
Optimal incremental multi-step nearest-neighbor search

Proceedings of the 16th ACM SIGSPATIAL international conference on Advances in geographic information systems
Generalizing the optimality of multi-step k-nearest neighbor query processing

SSTD'07 Proceedings of the 10th international conference on Advances in spatial and temporal databases

Quantified Score

Hi-index	12.05

Visualization

Abstract

The development of techniques that facilitate effective similarity search is important for many applications such as multi-media databases, content-based image retrieval, molecular biology, medical imaging, and object recognition, among others. Two of the common operations in this context are range queries and k-nearest neighbor search in high-dimensional space. However, the distance measures used to determine the dissimilarities between high-dimensional feature vectors are often expensive to compute. To reduce the number of expensive distance calculations in the search process, Korn, Sidiropoulos, Faloutsos, Siegel, and Protopapas (1996) proposed a multi-step algorithm, which involves two stages: filtering and refinement. It employs an easily computable lower-bound distance measure to filter out a candidate set in the filtering stage and confine the expensive distance computation to a small candidate set in the refinement stage. This algorithm was later improved by Seidl and Kriegel (1998) to produce optimal-sized candidate set in the filtering stage; the improved algorithm is said to be filtering optimal. However, the improved algorithm cannot produce the result incrementally in the refinement stage. The improved algorithm can only start to produce results after the whole search process stops, which is a disadvantage in real applications. In this paper, we experimentally demonstrate the applicability and effectiveness of an extended version of the algorithm that can produce the nearest neighbors incrementally in an optimal way in the sense that a nearest neighbor is output as soon as it can be determined using the existing information; thus, nearest neighbors are produced in order. Our algorithm is both filtering and refinement optimal, and well serves real applications. We have already proved the optimality of the proposed extended algorithm (Zhang, Alhajj, & Rokne, 2008), and in here we empirically demonstrate its independence on the number of nearest neighbors and its effectiveness in early retrieving results as compared to the previous algorithm.