Adapting indexing trees to data distribution in feature spaces

Authors:
Xiaoning Qian;Hemant D. Tagare
Affiliations:
Department of Diagnostic Radiology, Yale University, New Haven, CT 06520, United States;Department of Diagnostic Radiology, Yale University, New Haven, CT 06520, United States and Department of Electrical Engineering, Yale University, New Haven, CT 06520, United States
Venue:
Computer Vision and Image Understanding
Year:
2010

Citing 39
Cited 1

The design and analysis of spatial data structures

The design and analysis of spatial data structures
Introduction to statistical pattern recognition (2nd ed.)

Introduction to statistical pattern recognition (2nd ed.)
The R*-tree: an efficient and robust access method for points and rectangles

SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
Point location in arrangements of hyperplanes

Information and Computation
FastMap: a fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Photobook: content-based manipulation of image databases

International Journal of Computer Vision
A model for the prediction of R-tree performance

PODS '96 Proceedings of the fifteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Distance-based indexing for high-dimensional metric spaces

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
The SR-tree: an index structure for high-dimensional nearest neighbor queries

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
A cost model for nearest neighbor search in high-dimensional data space

PODS '97 Proceedings of the sixteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Visual information retrieval from large distributed online repositories

Communications of the ACM
A cost model for similarity queries in metric spaces

PODS '98 Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Multidimensional access methods

ACM Computing Surveys (CSUR)
On the geometry of similarity search: dimensionality curse and concentration of measure

Information Processing Letters
An Algorithm for Finding Best Matches in Logarithmic Expected Time

ACM Transactions on Mathematical Software (TOMS)
Multidimensional divide-and-conquer

Communications of the ACM
Searching in metric spaces

ACM Computing Surveys (CSUR)
A search engine for 3D models

ACM Transactions on Graphics (TOG)
R-trees: a dynamic index structure for spatial searching

SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
The TV-tree: an index structure for high-dimensional data

The VLDB Journal — The International Journal on Very Large Data Bases - Spatial Database Systems
Query by Image and Video Content: The QBIC System

Computer
Estimating the Intrinsic Dimension of Data with a Fractal-Based Method

IEEE Transactions on Pattern Analysis and Machine Intelligence
Unsupervised Feature Selection Applied to Content-Based Retrieval of Lung Images

IEEE Transactions on Pattern Analysis and Machine Intelligence
Frame-Rate Spatial Referencing Based on Invariant Indexing and Alignment with Application to Online Retinal Image Registration

IEEE Transactions on Pattern Analysis and Machine Intelligence
Similarity Indexing with the SS-tree

ICDE '96 Proceedings of the Twelfth International Conference on Data Engineering
M-tree: An Efficient Access Method for Similarity Search in Metric Spaces

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Contrast Plots and P-Sphere Trees: Space vs. Time in Nearest Neighbour Searches

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Near Neighbor Search in Large Metric Spaces

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Fast Nearest Neighbor Search in Medical Image Databases

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Ranking in Spatial Databases

SSD '95 Proceedings of the 4th International Symposium on Advances in Spatial Databases
CSVD: Clustering and Singular Value Decomposition for Approximate Similarity Search in High-Dimensional Spaces

IEEE Transactions on Knowledge and Data Engineering
Properties of Embedding Methods for Similarity Searching in Metric Spaces

IEEE Transactions on Pattern Analysis and Machine Intelligence
Content-Based Image Retrieval Systems

ASSET '99 Proceedings of the 1999 IEEE Symposium on Application - Specific Systems and Software Engineering and Technology
Deflating the Dimensionality Curse Using Multiple Fractal Dimensions

ICDE '00 Proceedings of the 16th International Conference on Data Engineering
Supporting Content-based Queries over Images in MARS

ICMCS '97 Proceedings of the 1997 International Conference on Multimedia Computing and Systems
Divide and conquer algorithms for closest point problems in multidimensional space.

Divide and conquer algorithms for closest point problems in multidimensional space.
Optimal embedding for shape indexing in medical image databases

MICCAI'05 Proceedings of the 8th international conference on Medical image computing and computer-assisted intervention - Volume Part II
Geodesic entropic graphs for dimension and entropy estimation in manifold learning

IEEE Transactions on Signal Processing

Incorporating multiple distance spaces in optimum-path forest classification to improve feedback-based learning

Computer Vision and Image Understanding

Quantified Score

Hi-index	0.00

Visualization

Abstract

Fast similarity retrieval is critical for content-based image retrieval systems. Tree indexing is a classical technique for fast retrieval, but the practical performance increase offered by the indexing tree depends on the intrinsic dimension of the data. Data with a low intrinsic dimension can be indexed more efficiently than data with high intrinsic dimension. This suggests that an indexing tree that is adapted to the data distribution may be more efficient. This paper proposes two adaptation procedures that are guaranteed to improve indexing efficiency. The procedures are based on a formula for average number of node tests incurred during the retrieval. The formula clearly shows how indexing performance varies with the distribution of feature points and the query. Greedy and optimal tree adaptation procedures are derived based on the formula. Both procedures explicitly enhance the retrieval performance of indexing trees. The optimally adapted tree carries the mathematical guarantee that it is the best performing tree in a set of possible trees obtained by node elimination. The adaptation procedures are applied to kdb-trees and hierarchical clustering trees for indexing synthetic as well as real data sets in medical image databases. Experimental results validate the claim that adaptation procedures increase retrieval efficiency.