Multi-Way Distance Join Queries in Spatial Databases
Geoinformatica
SIREN: a similarity retrieval engine for complex data
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Efficient index-based KNN join processing for high-dimensional data
Information and Software Technology
A performance comparison of distance-based query algorithms using R-trees in spatial databases
Information Sciences: an International Journal
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
IbPRIA '07 Proceedings of the 3rd Iberian conference on Pattern Recognition and Image Analysis, Part I
Customer's Relationship Segmentation Driving the Predictive Modeling for Bad Debt Events
UMAP '09 Proceedings of the 17th International Conference on User Modeling, Adaptation, and Personalization: formerly UM and AH
Design and evaluation of trajectory join algorithms
Proceedings of the 17th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems
Combining elimination rules in tree-based nearest neighbor search algorithms
SSPR&SPR'10 Proceedings of the 2010 joint IAPR international conference on Structural, syntactic, and statistical pattern recognition
A disk-aware algorithm for time series motif discovery
Data Mining and Knowledge Discovery
Hi-index | 0.00 |
The similarity join has become an important database primitiveto support similarity search and data mining. A similarity joincombines two sets of complex objects such that the result containsall pairs of similar objects. Well-known are two types of thesimilarity join, the distance range join where the user defines adistance threshold for the join, and the closest point query ork-distance join which retrieves the k most similar pairs. In thispaper, we investigate an important, third similarity join operationcalled k-nearest neighbor join which combines each point ofone point set with its k nearest neighbors in the other set. It hasbeen shown that many standard algorithms of Knowledge Discoveryin Databases (KDD) such as k-means and k-medoid clustering,nearest neighbor classification, data cleansing, postprocessingof sampling-based data mining etc. can be implementedon top of the k-nn join operation to achieve performance improvementswithout affecting the quality of the result of these algorithms.We propose a new algorithm to compute the k-nearestneighbor join using the multipage index (MuX), a specialized indexstructure for the similarity join. To reduce both CPU and I/Ocost, we develop optimal loading and processing strategies.