Data Partitioning for Parallel Spatial Join Processing

Authors:
Xiaofang Zhou;David J. Abel;David Truffet
Affiliations:
CSIRO Mathematical and Information Sciences, GPO Box 664, Canberra, ACT 2601, Australia {xiaofang.zhou, dave.abel, david.truffet}@cmis.csiro.au;CSIRO Mathematical and Information Sciences, GPO Box 664, Canberra, ACT 2601, Australia {xiaofang.zhou, dave.abel, david.truffet}@cmis.csiro.au;CSIRO Mathematical and Information Sciences, GPO Box 664, Canberra, ACT 2601, Australia {xiaofang.zhou, dave.abel, david.truffet}@cmis.csiro.au
Venue:
Geoinformatica
Year:
1998

Citing 20
Cited 10

Computational geometry: an introduction

Computational geometry: an introduction
Parallel database systems: the future of high performance database systems

Communications of the ACM
The SEQUOIA 2000 storage benchmark

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Efficient processing of spatial joins using R-trees

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Spatial joins using seeded trees

SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Spatial hash-joins

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Partition based spatial-merge join

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Query Processing in Parallel Relational Database Systems

Query Processing in Parallel Relational Database Systems
Fundamentals of Computer Alori

Fundamentals of Computer Alori
R-trees: a dynamic index structure for spatial searching

SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
An introduction to spatial database systems

The VLDB Journal — The International Journal on Very Large Data Bases - Spatial Database Systems
PROBE Spatial Data Modeling and Query Processing in an Image Database Application

IEEE Transactions on Software Engineering
Efficient Computation of Spatial Joins

Proceedings of the Ninth International Conference on Data Engineering
Parallel Processing of Spatial Joins Using R-trees

ICDE '96 Proceedings of the Twelfth International Conference on Data Engineering
The R+-Tree: A Dynamic Index for Multi-Dimensional Objects

VLDB '87 Proceedings of the 13th International Conference on Very Large Data Bases
Bucket Spreading Parallel Hash: A New, Robust, Parallel Hash Join Method for Data Skew in the Super Database Computer (SDC)

VLDB '90 Proceedings of the 16th International Conference on Very Large Data Bases
Handling Data Skew in Multiprocessor Database Computers Using Partition Tuning

VLDB '91 Proceedings of the 17th International Conference on Very Large Data Bases
Practical Skew Handling in Parallel Joins

VLDB '92 Proceedings of the 18th International Conference on Very Large Data Bases
Performance of Data-Parallel Spatial Operations

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Spatial Join Strategies in Distributed Spatial DBMS

SSD '95 Proceedings of the 4th International Symposium on Advances in Spatial Databases

A Parallel Spatial Join Processing for Distributed Spatial Databases

FQAS '02 Proceedings of the 5th International Conference on Flexible Query Answering Systems
Hierarchically organized skew-tolerant histograms for geographic data objects

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Towards personal high-performance geospatial computing (HPC-G): perspectives and a case study

Proceedings of the ACM SIGSPATIAL International Workshop on High Performance and Distributed Geographic Information Systems
Efficient parallel kNN joins for large data in MapReduce

Proceedings of the 15th International Conference on Extending Database Technology
Towards building a high performance spatial query system for large scale medical imaging data

Proceedings of the 20th International Conference on Advances in Geographic Information Systems
CudaGIS: report on the design and realization of a massive data parallel GIS on GPUs

Proceedings of the Third ACM SIGSPATIAL International Workshop on GeoStreaming
Speeding up large-scale point-in-polygon test based spatial join on GPUs

Proceedings of the 1st ACM SIGSPATIAL International Workshop on Analytics for Big Geospatial Data
A parallel spatial data analysis infrastructure for the cloud

Proceedings of the 21st ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems
Hadoop GIS: a high performance spatial data warehousing system over mapreduce

Proceedings of the VLDB Endowment
Data centric research at the University of Queensland

ACM SIGMOD Record

Quantified Score

Hi-index	0.00

Visualization

Abstract

The cost of spatial join processing can be very high because of thelarge sizes of spatial objects and the computation-intensive spatialoperations. While parallel processing seems a natural solution to thisproblem, it is not clear how spatial data can be partitioned for thispurpose. Various spatial data partitioning methods are examined in thispaper. A framework combining the data-partitioning techniques used by mostparallel join algorithms in relational databases and the filter-and-refinestrategy for spatial operation processing is proposed for parallel spatialjoin processing. Object duplication caused by multi-assignment in spatialdata partitioning can result in extra CPU cost as well as extracommunication cost. We find that the key to overcome this problem is topreserve spatial locality in task decomposition. In this paper we show thata near-optimal speedup can be achieved for parallel spatial join processingusing our new algorithms.