Computational geometry: an introduction
Computational geometry: an introduction
Multiattribute hashing using Gray codes
SIGMOD '86 Proceedings of the 1986 ACM SIGMOD international conference on Management of data
Fractals for secondary key retrieval
PODS '89 Proceedings of the eighth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Linear clustering of objects with multiple attributes
SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
Euclidean minimum spanning trees and bichromatic closest pairs
Discrete & Computational Geometry
Finding k farthest pairs and k closest/farthest bichromatic pairs for points in the plane
SCG '92 Proceedings of the eighth annual symposium on Computational geometry
Approximate nearest neighbor queries revisited
SCG '97 Proceedings of the thirteenth annual symposium on Computational geometry
Incremental distance join algorithms for spatial databases
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Distance browsing in spatial databases
ACM Transactions on Database Systems (TODS)
Closest pair queries in spatial databases
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Adaptive multi-stage distance join processing
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
High performance clustering based on the similarity join
Proceedings of the ninth international conference on Information and knowledge management
Epsilon grid order: an algorithm for the similarity join on massive high-dimensional data
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
ACM Computing Surveys (CSUR)
The Art of Computer Programming Volumes 1-3 Boxed Set
The Art of Computer Programming Volumes 1-3 Boxed Set
High Dimensional Similarity Joins: Algorithms and Performance Evaluation
ICDE '98 Proceedings of the Fourteenth International Conference on Data Engineering
High Dimensional Similarity Search With Space Filling Curves
Proceedings of the 17th International Conference on Data Engineering
C2P: Clustering based on Closest Pairs
Proceedings of the 27th International Conference on Very Large Data Bases
Approximate k -Closest-Pairs with Space Filling Curves
DaWaK 2000 Proceedings of the 4th International Conference on Data Warehousing and Knowledge Discovery
An index structure for improving nearest closest pairs and related join queries in spatial databases
IDEAS '02 Proceedings of the 2002 International Symposium on Database Engineering & Applications
Similarity Join for Low-and High-Dimensional Data
DASFAA '03 Proceedings of the Eighth International Conference on Database Systems for Advanced Applications
Divide-and-conquer in multidimensional space
STOC '76 Proceedings of the eighth annual ACM symposium on Theory of computing
Efficient evaluation of relevance feedback for multidimensional all-pairs retrieval
Proceedings of the 2003 ACM symposium on Applied computing
Evaluating Refined Queries in Top-k Retrieval Systems
IEEE Transactions on Knowledge and Data Engineering
Algorithms for processing K-closest-pair queries in spatial databases
Data & Knowledge Engineering
Top-k Closest Pairs Join Query: An Approximate Algorithm for Large High Dimensional Data
IDEAS '04 Proceedings of the International Database Engineering and Applications Symposium
Global Optimization with Non-Convex Constraints - Sequential and Parallel Algorithms (Nonconvex Optimization and its Applications Volume 45) (Nonconvex Optimization and Its Applications)
Solving similarity joins and range queries in metric spaces with the list of twin clusters
Journal of Discrete Algorithms
On efficient mutual nearest neighbor query processing in spatial databases
Data & Knowledge Engineering
Hi-index | 0.00 |
In this paper we present a novel approximate algorithm to calculate the top-k closest pairs join query of two large and high dimensional data sets. The algorithm has worst case time complexity O(d2nk) and space complexity O(nd) and guarantees a solution within a O(d1 + 1/t) factor of the exact one, where t ∈ {1,2,...,∞} denotes the Minkowski metrics Lt of interest and d the dimensionality. It makes use of the concept of space filling curve to establish an order between the points of the space and performs at most d + 1 sorts and scans of the two data sets. During a sca\n, each point from one data set is compared with its closest points, according to the space filling curve order, in the other data set and points whose contribution to the solution has already been analyzed are detected and eliminated. Experimental results on real and synthetic data sets show that our algorithm behaves as an exact algorithm in low dimensional spaces; it is able to prune the entire (or a considerable fraction of the) data set even for high dimensions if certain separation conditions are satisfied; in any case it returns a solution within a small error to the exact one.