Introduction to statistical pattern recognition (2nd ed.)
Introduction to statistical pattern recognition (2nd ed.)
Efficient processing of spatial joins using R-trees
SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Incremental distance join algorithms for spatial databases
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Mining association rules with multiple minimum supports
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Closest pair queries in spatial databases
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Epsilon grid order: an algorithm for the similarity join on massive high-dimensional data
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Random projection in dimensionality reduction: applications to image and text data
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
High Dimensional Similarity Joins: Algorithms and Performance Evaluation
IEEE Transactions on Knowledge and Data Engineering
High-Dimensional Similarity Joins
ICDE '97 Proceedings of the Thirteenth International Conference on Data Engineering
Similarity Search without Tears: The OMNI Family of All-purpose Access Methods
Proceedings of the 17th International Conference on Data Engineering
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Local Dimensionality Reduction: A New Approach to Indexing High Dimensional Spaces
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Indexing the Distance: An Efficient Method to KNN Processing
Proceedings of the 27th International Conference on Very Large Data Bases
Optimal Dimension Order: A Generic Technique for the Similarity Join
DaWaK 2000 Proceedings of the 4th International Conference on Data Warehousing and Knowledge Discovery
High Performance Data Mining Using the Nearest Neighbor Join
ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
All-Nearest-Neighbors Queries in Spatial Databases
SSDBM '04 Proceedings of the 16th International Conference on Scientific and Statistical Database Management
The k-Nearest Neighbour Join: Turbo Charging the KDD Process
Knowledge and Information Systems
iDistance: An adaptive B+-tree based indexing method for nearest neighbor search
ACM Transactions on Database Systems (TODS)
VA-files vs. r*-trees in distance join queries
ADBIS'05 Proceedings of the 9th East European conference on Advances in Databases and Information Systems
High-dimensional kNN joins with incremental updates
Geoinformatica
Optimizing all-nearest-neighbor queries with trigonometric pruning
SSDBM'10 Proceedings of the 22nd international conference on Scientific and statistical database management
A disk-aware algorithm for time series motif discovery
Data Mining and Knowledge Discovery
Large-scale similarity-based join processing in multimedia databases
MMM'12 Proceedings of the 18th international conference on Advances in Multimedia Modeling
Pivot selection: Dimension reduction for distance-based indexing
Journal of Discrete Algorithms
Efficient parallel kNN joins for large data in MapReduce
Proceedings of the 15th International Conference on Extending Database Technology
Efficient processing of k nearest neighbor joins using MapReduce
Proceedings of the VLDB Endowment
Parallel k-most similar neighbor classifier for mixed data
IDEAL'12 Proceedings of the 13th international conference on Intelligent Data Engineering and Automated Learning
Hi-index | 0.00 |
In many advanced database applications (e.g., multimedia databases), data objects are transformed into high-dimensional points and manipulated in high-dimensional space. One of the most important but costly operations is the similarity join that combines similar points from multiple datasets. In this paper, we examine the problem of processing K-nearest neighbor similarity join (KNN join). KNN join between two datasets, R and S, returns for each point in R its K most similar points in S. We propose a new index-based KNN join approach using the iDistance as the underlying index structure. We first present its basic algorithm and then propose two different enhancements. In the first enhancement, we optimize the original KNN join algorithm by using approximation bounding cubes. In the second enhancement, we exploit the reduced dimensions of data space. We conducted an extensive experimental study using both synthetic and real datasets, and the results verify the performance advantage of our schemes over existing KNN join algorithms.