SIGMOD '89 Proceedings of the 1989 ACM SIGMOD international conference on Management of data
Clone join and shadow join: two parallel spatial join algorithms
Proceedings of the 8th ACM international symposium on Advances in geographic information systems
Data Partitioning for Parallel Spatial Join Processing
Geoinformatica
Parallel Processing of Spatial Joins Using R-trees
ICDE '96 Proceedings of the Twelfth International Conference on Data Engineering
VLDB '90 Proceedings of the 16th International Conference on Very Large Data Bases
Parallel R-Tree Spatial Join for a Shared-Nothing Architecture
DANTE '99 Proceedings of the 1999 International Symposium on Database Applications in Non-Traditional Environments
The k-Nearest Neighbour Join: Turbo Charging the KDD Process
Knowledge and Information Systems
Distributed computation of the knn graph for large high-dimensional point sets
Journal of Parallel and Distributed Computing
Efficient index-based KNN join processing for high-dimensional data
Information and Software Technology
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Data-Parallel Spatial Join Algorithms
ICPP '94 Proceedings of the 1994 International Conference on Parallel Processing - Volume 03
Gorder: an efficient method for KNN join processing
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Spatial Queries Evaluation with MapReduce
GCC '09 Proceedings of the 2009 Eighth International Conference on Grid and Cooperative Computing
High-dimensional kNN joins with incremental updates
Geoinformatica
Efficient parallel set-similarity joins using MapReduce
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Indexing multi-dimensional data in a cloud system
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Efficient B-tree based indexing for cloud data processing
Proceedings of the VLDB Endowment
Voronoi-Based Geospatial Query Processing with MapReduce
CLOUDCOM '10 Proceedings of the 2010 IEEE Second International Conference on Cloud Computing Technology and Science
RanKloud: a scalable ranked query processing framework on hadoop
Proceedings of the 14th International Conference on Extending Database Technology
Processing theta-joins using MapReduce
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Llama: leveraging columnar storage for scalable join processing in the MapReduce framework
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Parallel construction of k-nearest neighbor graphs for point clouds
SPBG'08 Proceedings of the Fifth Eurographics / IEEE VGTC conference on Point-Based Graphics
Efficient processing of k nearest neighbor joins using MapReduce
Proceedings of the VLDB Endowment
CudaGIS: report on the design and realization of a massive data parallel GIS on GPUs
Proceedings of the Third ACM SIGSPATIAL International Workshop on GeoStreaming
Speeding up large-scale point-in-polygon test based spatial join on GPUs
Proceedings of the 1st ACM SIGSPATIAL International Workshop on Analytics for Big Geospatial Data
Proceedings of the 25th International Conference on Scientific and Statistical Database Management
Distributed data management using MapReduce
ACM Computing Surveys (CSUR)
CG_Hadoop: computational geometry in MapReduce
Proceedings of the 21st ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems
A demonstration of SpatialHadoop: an efficient mapreduce framework for spatial data
Proceedings of the VLDB Endowment
DIMO: distributed index for matching multimedia objects using MapReduce
Proceedings of the 5th ACM Multimedia Systems Conference
ComMapReduce: An improvement of MapReduce with lightweight communication mechanisms
Data & Knowledge Engineering
Hi-index | 0.00 |
In data mining applications and spatial and multimedia databases, a useful tool is the kNN join, which is to produce the k nearest neighbors (NN), from a dataset S, of every point in a dataset R. Since it involves both the join and the NN search, performing kNN joins efficiently is a challenging task. Meanwhile, applications continue to witness a quick (exponential in some cases) increase in the amount of data to be processed. A popular model nowadays for large-scale data processing is the shared-nothing cluster on a number of commodity machines using MapReduce [6]. Hence, how to execute kNN joins efficiently on large data that are stored in a MapReduce cluster is an intriguing problem that meets many practical needs. This work proposes novel (exact and approximate) algorithms in MapReduce to perform efficient parallel kNN joins on large data. We demonstrate our ideas using Hadoop. Extensive experiments in large real and synthetic datasets, with tens or hundreds of millions of records in both R and S and up to 30 dimensions, have demonstrated the efficiency, effectiveness, and scalability of our methods.