Efficient index-based KNN join processing for high-dimensional data

Authors:
Cui Yu;Bin Cui;Shuguang Wang;Jianwen Su
Affiliations:
Department of Computer Science, Monmouth University, West Long Branch, NJ 07764, USA;Department of Computer Science, Peking University, Beijing, China;Department of Computer Science, National University of Singapore, Singapore;Department of Computer Science, University of California, Santa Barbara, CA 93106, USA
Venue:
Information and Software Technology
Year:
2007

Citing 19
Cited 9

Introduction to statistical pattern recognition (2nd ed.)

Introduction to statistical pattern recognition (2nd ed.)
Efficient processing of spatial joins using R-trees

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Incremental distance join algorithms for spatial databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Mining association rules with multiple minimum supports

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Closest pair queries in spatial databases

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Epsilon grid order: an algorithm for the similarity join on massive high-dimensional data

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Random projection in dimensionality reduction: applications to image and text data

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
High Dimensional Similarity Joins: Algorithms and Performance Evaluation

IEEE Transactions on Knowledge and Data Engineering
High-Dimensional Similarity Joins

ICDE '97 Proceedings of the Thirteenth International Conference on Data Engineering
Similarity Search without Tears: The OMNI Family of All-purpose Access Methods

Proceedings of the 17th International Conference on Data Engineering
A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Local Dimensionality Reduction: A New Approach to Indexing High Dimensional Spaces

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Indexing the Distance: An Efficient Method to KNN Processing

Proceedings of the 27th International Conference on Very Large Data Bases
Optimal Dimension Order: A Generic Technique for the Similarity Join

DaWaK 2000 Proceedings of the 4th International Conference on Data Warehousing and Knowledge Discovery
High Performance Data Mining Using the Nearest Neighbor Join

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
All-Nearest-Neighbors Queries in Spatial Databases

SSDBM '04 Proceedings of the 16th International Conference on Scientific and Statistical Database Management
The k-Nearest Neighbour Join: Turbo Charging the KDD Process

Knowledge and Information Systems
iDistance: An adaptive B+-tree based indexing method for nearest neighbor search

ACM Transactions on Database Systems (TODS)
VA-files vs. r*-trees in distance join queries

ADBIS'05 Proceedings of the 9th East European conference on Advances in Databases and Information Systems

Automated multi-label text categorization with VG-RAM weightless neural networks

Neurocomputing
High-dimensional kNN joins with incremental updates

Geoinformatica
Optimizing all-nearest-neighbor queries with trigonometric pruning

SSDBM'10 Proceedings of the 22nd international conference on Scientific and statistical database management
A disk-aware algorithm for time series motif discovery

Data Mining and Knowledge Discovery
Large-scale similarity-based join processing in multimedia databases

MMM'12 Proceedings of the 18th international conference on Advances in Multimedia Modeling
Pivot selection: Dimension reduction for distance-based indexing

Journal of Discrete Algorithms
Efficient parallel kNN joins for large data in MapReduce

Proceedings of the 15th International Conference on Extending Database Technology
Efficient processing of k nearest neighbor joins using MapReduce

Proceedings of the VLDB Endowment
Parallel k-most similar neighbor classifier for mixed data

IDEAL'12 Proceedings of the 13th international conference on Intelligent Data Engineering and Automated Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

In many advanced database applications (e.g., multimedia databases), data objects are transformed into high-dimensional points and manipulated in high-dimensional space. One of the most important but costly operations is the similarity join that combines similar points from multiple datasets. In this paper, we examine the problem of processing K-nearest neighbor similarity join (KNN join). KNN join between two datasets, R and S, returns for each point in R its K most similar points in S. We propose a new index-based KNN join approach using the iDistance as the underlying index structure. We first present its basic algorithm and then propose two different enhancements. In the first enhancement, we optimize the original KNN join algorithm by using approximation bounding cubes. In the second enhancement, we exploit the reduced dimensions of data space. We conducted an extensive experimental study using both synthetic and real datasets, and the results verify the performance advantage of our schemes over existing KNN join algorithms.