Parallel database systems: the future of high performance database systems
Communications of the ACM
Partition based spatial-merge join
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Clone join and shadow join: two parallel spatial join algorithms
Proceedings of the 8th ACM international symposium on Advances in geographic information systems
Scheduling Divisible Loads in Parallel and Distributed Systems
Scheduling Divisible Loads in Parallel and Distributed Systems
Data Partitioning for Parallel Spatial Join Processing
Geoinformatica
Parallel Processing of Spatial Joins Using R-trees
ICDE '96 Proceedings of the Twelfth International Conference on Data Engineering
Leveraging Non-Uniform Resources for Parallel Query Processing
CCGRID '03 Proceedings of the 3st International Symposium on Cluster Computing and the Grid
ACM Transactions on Database Systems (TODS)
Parallel Query Processing in Databases on Multicore Architectures
ICA3PP '08 Proceedings of the 8th international conference on Algorithms and Architectures for Parallel Processing
A comparison of approaches to large-scale data analysis
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads
Proceedings of the VLDB Endowment
Improving MapReduce performance in heterogeneous environments
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Jackpine: A benchmark to evaluate spatial database performance
ICDE '11 Proceedings of the 2011 IEEE 27th International Conference on Data Engineering
Massively parallel sort-merge joins in main memory multi-core database systems
Proceedings of the VLDB Endowment
More for your money: exploiting performance heterogeneity in public clouds
Proceedings of the Third ACM Symposium on Cloud Computing
Towards building a high performance spatial query system for large scale medical imaging data
Proceedings of the 20th International Conference on Advances in Geographic Information Systems
Surveying the landscape: an in-depth analysis of spatial database workloads
Proceedings of the 20th International Conference on Advances in Geographic Information Systems
Petabyte scale databases and storage systems at Facebook
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Hi-index | 0.00 |
Spatial data analysis applications are emerging from a wide range of domains such as building information management, environmental assessments and medical imaging. Time-consuming computational geometry algorithms make these applications slow, even for medium-sized datasets. At the same time, there is a rapid expansion in available processing cores, through multicore machines and Cloud computing. The confluence of these trends demands effective parallelization of spatial query processing. Unfortunately, traditional parallel spatial databases are ill-equipped to deal with the performance heterogeneity that is common in the Cloud. We introduce Niharika, a parallel spatial data analysis infrastructure that exploits all available cores in a heterogeneous cluster. Niharika first uses a declustering technique that creates balanced spatial partitions. Then, Niharika adapts to performance heterogeneity and processing skew in the spatial dataset using dynamic load-balancing. We evaluate Niharika with three load-balancing algorithms and two different spatial datasets (both from TIGER) using Amazon EC2 instances. Niharika adapts to the performance heterogeneity in the EC2 nodes, thereby achieving excellent speedups (e.g., 63.6X using 64 cores on 16 4-core EC2 nodes, in the best case) and outperforming an approach that does not adapt.