Sort-based parallel loading of R-trees

  • Authors:
  • Daniar Achakeev;Marc Seidemann;Markus Schmidt;Bernhard Seeger

  • Affiliations:
  • University of Marburg, Marburg, Germany;University of Marburg, Marburg, Germany;University of Marburg, Marburg, Germany;University of Marburg, Marburg, Germany

  • Venue:
  • Proceedings of the 1st ACM SIGSPATIAL International Workshop on Analytics for Big Geospatial Data
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Due to the increasing amount of spatial data, parallel algorithms for processing big spatial data become more and more important. In particular, the shared nothing architecture is attractive as it offers low cost data processing. Moreover, popular MapReduce frameworks such as Hadoop allow developing conceptually simple and scalable algorithms for processing big data using this architecture. In this work we address the problem of parallel loading of R-trees on a shared-nothing platform. The R-tree is a key element for efficient query processing in large spatial database, but its creation is expensive. We proposed a novel scalable parallel loading algorithm for MapReduce. The core of our parallel loading is the state of the art sequential sort-based query-adaptive R-tree loading algorithm that builds R-trees optimized according to a commonly used cost model. In contrast to previous methods for loading R-trees with MapReduce we construct the R-tree level-wise. Our experimental results show an almost linear speedup in the number of machines. Moreover, the resulting R-trees provide a better query performance than R-trees build by other competitive bulk-loading algorithms.