Contorting high dimensional data for efficient main memory KNN processing

Authors:
Bin Cui;Beng Chin Ooi;Jianwen Su;Kian-Lee Tan
Affiliations:
National University of Singapore, Singapore;National University of Singapore, Singapore;University of California, Santa Barbara, CA;National University of Singapore, Singapore
Venue:
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Year:
2003

Citing 11
Cited 17

Distance-based indexing for high-dimensional metric spaces

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Making B+- trees cache conscious in main memory

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Optimizing multidimensional index trees for main memory access

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Main-memory index structures with fixed-size partial keys

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Improving index performance through prefetching

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Searching in high-dimensional spaces: Index structures for improving the performance of multimedia databases

ACM Computing Surveys (CSUR)
The TV-tree: an index structure for high-dimensional data

The VLDB Journal — The International Journal on Very Large Data Bases - Spatial Database Systems
M-tree: An Efficient Access Method for Similarity Search in Metric Spaces

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Local Dimensionality Reduction: A New Approach to Indexing High Dimensional Spaces

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Indexing the Distance: An Efficient Method to KNN Processing

Proceedings of the 27th International Conference on Very Large Data Bases

Indexing High-Dimensional Data for Efficient In-Memory Similarity Search

IEEE Transactions on Knowledge and Data Engineering
Video Data Mining: Semantic Indexing and Event Detection from the Association Perspective

IEEE Transactions on Knowledge and Data Engineering
iDistance: An adaptive B+-tree based indexing method for nearest neighbor search

ACM Transactions on Database Systems (TODS)
Exploring bit-difference for approximate KNN search in high-dimensional databases

ADC '05 Proceedings of the 16th Australasian database conference - Volume 39
Self-tuning cost modeling of user-defined functions in an object-relational DBMS

ACM Transactions on Database Systems (TODS)
Range Nearest-Neighbor Query

IEEE Transactions on Knowledge and Data Engineering
Indexing for function approximation

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Exploring composite acoustic features for efficient music similarity query

MULTIMEDIA '06 Proceedings of the 14th annual ACM international conference on Multimedia
Distributed computation of the knn graph for large high-dimensional point sets

Journal of Parallel and Distributed Computing
Exploiting parallelism to support scalable hierarchical clustering

Journal of the American Society for Information Science and Technology
Querying time-series streams

EDBT '08 Proceedings of the 11th international conference on Extending database technology: Advances in database technology
Mining partial periodic correlations in time series

Knowledge and Information Systems
QUC-tree: integrating query context information for efficient music retrieval

IEEE Transactions on Multimedia - Special issue on integration of context and content
Indexing high-dimensional data for main-memory similarity search

Information Systems
iPoc: a polar coordinate based indexing method for nearest neighbor search in high dimensional space

WAIM'10 Proceedings of the 11th international conference on Web-age information management
ISIS: a new approach for efficient similarity search in sparse databases

DASFAA'10 Proceedings of the 15th international conference on Database Systems for Advanced Applications - Volume Part II
GMM-ClusterForest: a novel indexing approach for multi-features based similarity search in high-dimensional spaces

ICONIP'12 Proceedings of the 19th international conference on Neural Information Processing - Volume Part II

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we present a novel index structure, called Δ-tree, to speed up processing of high-dimensional K-nearest neighbor (KNN) queries in main memory environment. The Δ-tree is a multi-level structure where each level represents the data space at different dimensionalities: the number of dimensions increases towards the leaf level which contains the data at their full dimensions. The remaining dimensions are obtained using Principal Component Analysis, which has the desirable property that the first few dimensions capture most of the information in the dataset. Each level of the tree serves to prune the search space more efficiently as the reduced dimensions can better exploit the small cache line size. Moreover, the distance computation on lower dimensionality is less expensive. We also propose an extension, called Δ+-tree, that globally clusters the data space and then further partitions clusters into small regions to reduce the search space. We conducted extensive experiments to evaluate the proposed structures against existing techniques on different kinds of datasets. Our results show that the Δ+-tree is superior in most cases.