A Fast k Nearest Neighbor Finding Algorithm Based on the Ordered Partition
IEEE Transactions on Pattern Analysis and Machine Intelligence
The R*-tree: an efficient and robust access method for points and rectangles
SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Efficiently supporting ad hoc queries in large datasets of time sequences
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
The SR-tree: an index structure for high-dimensional nearest neighbor queries
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Multidimensional access methods
ACM Computing Surveys (CSUR)
Clustering and singular value decomposition for approximate indexing in high dimensional spaces
Proceedings of the seventh international conference on Information and knowledge management
Distance browsing in spatial databases
ACM Transactions on Database Systems (TODS)
Scalability for clustering algorithms revisited
ACM SIGKDD Explorations Newsletter
ACM Computing Surveys (CSUR)
Searching Multimedia Databases by Content
Searching Multimedia Databases by Content
Image Databases: Search and Retrieval of Digital Imagery
Image Databases: Search and Retrieval of Digital Imagery
The K-D-B-tree: a search structure for large multidimensional dynamic indexes
SIGMOD '81 Proceedings of the 1981 ACM SIGMOD international conference on Management of data
R-trees: a dynamic index structure for spatial searching
SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
Clustering for Approximate Similarity Search in High-Dimensional Spaces
IEEE Transactions on Knowledge and Data Engineering
Similarity Indexing with the SS-tree
ICDE '96 Proceedings of the Twelfth International Conference on Data Engineering
Similarity Search without Tears: The OMNI Family of All-purpose Access Methods
Proceedings of the 17th International Conference on Data Engineering
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Local Dimensionality Reduction: A New Approach to Indexing High Dimensional Spaces
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
The X-tree: An Index Structure for High-Dimensional Data
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
SSD '95 Proceedings of the 4th International Symposium on Advances in Spatial Databases
IEEE Transactions on Knowledge and Data Engineering
Rules of Thumb in Data Engineering
ICDE '00 Proceedings of the 16th International Conference on Data Engineering
Buffering databse operations for enhanced instruction cache performance
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Exact k-NN queries on clustered SVD datasets
Information Processing Letters
High-dimensional indexing methods utilizing clustering and dimensionality reduction
High-dimensional indexing methods utilizing clustering and dimensionality reduction
Multidimensional Binary Search Trees in Database Applications
IEEE Transactions on Software Engineering
Multimedia Tools and Applications
Hi-index | 0.00 |
Similarity search implemented via k-nearest neighbor-- k-NN queries on multidimensional indices is an extremely useful paradigm for content-based image retrieval. As the dimensionality of feature vectors increases the curse of dimensionality sets in, i.e., the performance of k-NN search of disk-resident indices in the R-tree family degrades rapidly due to the overlap in index pages in high dimensions. This problem is dealt with in this study by utilizing the double filtering effect of clustering and indexing. The clustering algorithm ensures that the largest cluster fits into main memory and that only clusters closest to a query point need to be searched and hence loaded into main memory. We organize the data in each cluster according to the ordered-partition--OP-tree main memory resident index, which is not prone to the curse of dimensionality and highly efficient for processing k-NN queries. We serialize an OP-tree by writing its dynamically allocated nodes into contiguous memory locations, optimize its parameters, and make it persistent by writing it to disk. The time to read and write clusters constituting an OP-tree with a single sequential access to disk benefits from higher data transfer rates of modern disk drives. The performance of the index is further improved by applying the Karhunen---Loève transformation--KLT to the dataset, since this results in a more efficient computation of distances for k-NN queries. We compare OP-trees and sequential scans with and without a KL-transformation and with and without using a shortcut method in calculating Euclidean distances. A comparison against the OMNI-sequential scan is also reported. We finally compare a clustered and persistent version of the OP-tree against a clustered version of the SR-tree and the VA-file method. CPU time is measured and elapsed time is estimated in this study. It is observed that the OP-tree index outperforms the other two methods and that the improvement increases with the number of dimensions.