A Fast k Nearest Neighbor Finding Algorithm Based on the Ordered Partition
IEEE Transactions on Pattern Analysis and Machine Intelligence
The SR-tree: an index structure for high-dimensional nearest neighbor queries
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Multidimensional access methods
ACM Computing Surveys (CSUR)
Scalability for clustering algorithms revisited
ACM SIGKDD Explorations Newsletter
Searching Multimedia Databases by Content
Searching Multimedia Databases by Content
Image Databases: Search and Retrieval of Digital Imagery
Image Databases: Search and Retrieval of Digital Imagery
Clustering for Approximate Similarity Search in High-Dimensional Spaces
IEEE Transactions on Knowledge and Data Engineering
Similarity Search without Tears: The OMNI Family of All-purpose Access Methods
Proceedings of the 17th International Conference on Data Engineering
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Local Dimensionality Reduction: A New Approach to Indexing High Dimensional Spaces
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
The X-tree: An Index Structure for High-Dimensional Data
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
IEEE Transactions on Knowledge and Data Engineering
Rules of Thumb in Data Engineering
ICDE '00 Proceedings of the 16th International Conference on Data Engineering
High-dimensional indexing methods utilizing clustering and dimensionality reduction
High-dimensional indexing methods utilizing clustering and dimensionality reduction
Hi-index | 0.00 |
Similarity search implemented via k-Nearest-Neighbor (k-NN) queries is an extremely useful paradigm in content based image retrieval (CBIR), which is costly on high-dimensional indices due to the curse of dimensionality. We improve k-NN query processing by utilizing the double filtering effect of clustering and indexing on a persistent version of the Ordered-Partition tree (OP-tree) index, which is highly efficient in processing k-NN queries. The OP-tree is made persistent by writing it onto disk after serialization, i.e. arranging its nodes into contiguous memory locations, so that the high transfer rate of modern disk drives is exploited. We first report experimental results to optimize OP-tree parameters. We then compare OP-trees and sequential scans with options for the Karhunen-Loève transform and Euclidean distance calculation. Comparisons against OMNI-based sequential scan are also reported. We finally compare a clustered and persistent version of the OP-tree against a clustered version of the SR-tree and the VA-File method. It is observed that the OP-tree index outperforms the other two methods and that the improvement increases with the number of dimensions.