Sort-based parallel loading of R-trees

Authors:
Daniar Achakeev;Marc Seidemann;Markus Schmidt;Bernhard Seeger
Affiliations:
University of Marburg, Marburg, Germany;University of Marburg, Marburg, Germany;University of Marburg, Marburg, Germany;University of Marburg, Marburg, Germany
Venue:
Proceedings of the 1st ACM SIGSPATIAL International Workshop on Analytics for Big Geospatial Data
Year:
2012

Citing 15
Cited 1

Approximate medians and other quantiles in one pass and with limited memory

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Fast incremental maintenance of approximate histograms

ACM Transactions on Database Systems (TODS)
STR: A Simple and Efficient Algorithm for R-Tree Packing

ICDE '97 Proceedings of the Thirteenth International Conference on Data Engineering
A Generic Approach to Bulk Loading Multidimensional Index Structures

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
XXL - A Library Approach to Supporting Efficient Implementations of Advanced Database Queries

Proceedings of the 27th International Conference on Very Large Data Bases
Efficient Bulk Operations on Dynamic R-trees

ALENEX '99 Selected papers from the International Workshop on Algorithm Engineering and Experimentation
Master-Client R-Trees: A New Parallel R-Tree Architecture

SSDBM '99 Proceedings of the 11th International Conference on Scientific and Statistical Database Management
Parallel bulk-loading of spatial data

Parallel Computing - Special issue: High performance computing with geographical data
The Priority R-tree: a practically efficient and worst-case optimal R-tree

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
A revised r*-tree in comparison with related index structures

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Experiences on Processing Spatial Data with MapReduce

SSDBM 2009 Proceedings of the 21st International Conference on Scientific and Statistical Database Management
Leveraging Cloud Computing in Geodatabase Management

GRC '10 Proceedings of the 2010 IEEE International Conference on Granular Computing
Parallel sorting pattern

Proceedings of the 2010 Workshop on Parallel Programming Patterns
Sort-based query-adaptive loading of R-trees

Proceedings of the 21st ACM international conference on Information and knowledge management

Parallel spatial query processing on GPUs using R-trees

Proceedings of the 2nd ACM SIGSPATIAL International Workshop on Analytics for Big Geospatial Data

Quantified Score

Hi-index	0.00

Visualization

Abstract

Due to the increasing amount of spatial data, parallel algorithms for processing big spatial data become more and more important. In particular, the shared nothing architecture is attractive as it offers low cost data processing. Moreover, popular MapReduce frameworks such as Hadoop allow developing conceptually simple and scalable algorithms for processing big data using this architecture. In this work we address the problem of parallel loading of R-trees on a shared-nothing platform. The R-tree is a key element for efficient query processing in large spatial database, but its creation is expensive. We proposed a novel scalable parallel loading algorithm for MapReduce. The core of our parallel loading is the state of the art sequential sort-based query-adaptive R-tree loading algorithm that builds R-trees optimized according to a commonly used cost model. In contrast to previous methods for loading R-trees with MapReduce we construct the R-tree level-wise. Our experimental results show an almost linear speedup in the number of machines. Moreover, the resulting R-trees provide a better query performance than R-trees build by other competitive bulk-loading algorithms.