Distance-based indexing for high-dimensional metric spaces
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
The pyramid-technique: towards breaking the curse of dimensionality
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Making B+- trees cache conscious in main memory
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Optimizing multidimensional index trees for main memory access
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Main-memory index structures with fixed-size partial keys
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Improving index performance through prefetching
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
ACM Computing Surveys (CSUR)
The TV-tree: an index structure for high-dimensional data
The VLDB Journal — The International Journal on Very Large Data Bases - Spatial Database Systems
Fast Indexing and Visualization of Metric Data Sets using Slim-Trees
IEEE Transactions on Knowledge and Data Engineering
Similarity Search without Tears: The OMNI Family of All-purpose Access Methods
Proceedings of the 17th International Conference on Data Engineering
M-tree: An Efficient Access Method for Similarity Search in Metric Spaces
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Local Dimensionality Reduction: A New Approach to Indexing High Dimensional Spaces
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Indexing the Distance: An Efficient Method to KNN Processing
Proceedings of the 27th International Conference on Very Large Data Bases
Contorting high dimensional data for efficient main memory KNN processing
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Main Memory Indexing: The Case for BD-Tree
IEEE Transactions on Knowledge and Data Engineering
iDistance: An adaptive B+-tree based indexing method for nearest neighbor search
ACM Transactions on Database Systems (TODS)
Effectiveness of NAQ-tree as index structure for similarity search in high-dimensional metric space
Knowledge and Information Systems
Effectiveness of optimal incremental multi-step nearest neighbor search
Expert Systems with Applications: An International Journal
Indexing high-dimensional data for main-memory similarity search
Information Systems
Scalable kNN search on vertically stored time series
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Generalizing the k-Windows clustering algorithm in metric spaces
Mathematical and Computer Modelling: An International Journal
Boosting multi-kernel locality-sensitive hashing for scalable image retrieval
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Publishing microdata with a robust privacy guarantee
Proceedings of the VLDB Endowment
Hi-index | 0.00 |
In main memory systems, the L2 cache typically employs cache line sizes of 32-128 bytes. These values are relatively small compared to high-dimensional data, e.g., 32D. The consequence is that existing techniques (on low-dimensional data) that minimize cache misses are no longer effective. In this paper, we present a novel index structure, called \Delta{\hbox{-}}{\rm{tree}}, to speed up the high-dimensional query in main memory environment. The \Delta{\hbox{-}}{\rm{tree}} is a multilevel structure where each level represents the data space at different dimensionalities: the number of dimensions increases toward the leaf level. The remaining dimensions are obtained using Principal Component Analysis. Each level of the tree serves to prune the search space more efficiently as the lower dimensions can reduce the distance computation and better exploit the small cache line size. Additionally, the top-down clustering scheme can capture the feature of the data set and, hence, reduces the search space. We also propose an extension, called \Delta^+{\hbox{-}}{\rm{tree}}, that globally clusters the data space and then partitions clusters into small regions. The \Delta^+{\hbox{-}}{\rm{tree}} can further reduce the computational cost and cache misses. We conducted extensive experiments to evaluate the proposed structures against existing techniques on different kinds of data sets. Our results show that the \Delta^+{\hbox{-}}{\rm{tree}} is superior in most cases.