Persistent clustered main memory index for accelerating k-NN queries on high dimensional datasets

Authors:
Lijuan Zhang;Alexander Thomasian
Affiliations:
New Jersey Institute of Technology, Newark, NJ;New Jersey Institute of Technology, Newark, NJ
Venue:
Proceedings of the 2nd international workshop on Computer vision meets databases
Year:
2005

Citing 14
Cited 0

A Fast k Nearest Neighbor Finding Algorithm Based on the Ordered Partition

IEEE Transactions on Pattern Analysis and Machine Intelligence
The SR-tree: an index structure for high-dimensional nearest neighbor queries

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Multidimensional access methods

ACM Computing Surveys (CSUR)
Scalability for clustering algorithms revisited

ACM SIGKDD Explorations Newsletter
Searching Multimedia Databases by Content

Searching Multimedia Databases by Content
Image Databases: Search and Retrieval of Digital Imagery

Image Databases: Search and Retrieval of Digital Imagery
Clustering for Approximate Similarity Search in High-Dimensional Spaces

IEEE Transactions on Knowledge and Data Engineering
Similarity Search without Tears: The OMNI Family of All-purpose Access Methods

Proceedings of the 17th International Conference on Data Engineering
A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Local Dimensionality Reduction: A New Approach to Indexing High Dimensional Spaces

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
The X-tree: An Index Structure for High-Dimensional Data

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
CSVD: Clustering and Singular Value Decomposition for Approximate Similarity Search in High-Dimensional Spaces

IEEE Transactions on Knowledge and Data Engineering
Rules of Thumb in Data Engineering

ICDE '00 Proceedings of the 16th International Conference on Data Engineering
High-dimensional indexing methods utilizing clustering and dimensionality reduction

High-dimensional indexing methods utilizing clustering and dimensionality reduction

Quantified Score

Hi-index	0.00

Visualization

Abstract

Similarity search implemented via k-Nearest-Neighbor (k-NN) queries is an extremely useful paradigm in content based image retrieval (CBIR), which is costly on high-dimensional indices due to the curse of dimensionality. We improve k-NN query processing by utilizing the double filtering effect of clustering and indexing on a persistent version of the Ordered-Partition tree (OP-tree) index, which is highly efficient in processing k-NN queries. The OP-tree is made persistent by writing it onto disk after serialization, i.e. arranging its nodes into contiguous memory locations, so that the high transfer rate of modern disk drives is exploited. We first report experimental results to optimize OP-tree parameters. We then compare OP-trees and sequential scans with options for the Karhunen-Loève transform and Euclidean distance calculation. Comparisons against OMNI-based sequential scan are also reported. We finally compare a clustered and persistent version of the OP-tree against a clustered version of the SR-tree and the VA-File method. It is observed that the OP-tree index outperforms the other two methods and that the improvement increases with the number of dimensions.