Experiences on Processing Spatial Data with MapReduce

  • Authors:
  • Ariel Cary;Zhengguo Sun;Vagelis Hristidis;Naphtali Rishe

  • Affiliations:
  • School of Computing and Information Sciences, Florida International University, Miami, FL 33199;School of Computing and Information Sciences, Florida International University, Miami, FL 33199;School of Computing and Information Sciences, Florida International University, Miami, FL 33199;School of Computing and Information Sciences, Florida International University, Miami, FL 33199

  • Venue:
  • SSDBM 2009 Proceedings of the 21st International Conference on Scientific and Statistical Database Management
  • Year:
  • 2009

Quantified Score

Hi-index 0.03

Visualization

Abstract

The amount of information in spatial databases is growing as more data is made available. Spatial databases mainly store two types of data: raster data (satellite/aerial digital images), and vector data (points, lines, polygons). The complexity and nature of spatial databases makes them ideal for applying parallel processing. MapReduce is an emerging massively parallel computing model, proposed by Google. In this work, we present our experiences in applying the MapReduce model to solve two important spatial problems: (a) bulk-construction of R-Trees and (b) aerial image quality computation, which involve vector and raster data, respectively. We present our results on the scalability of MapReduce, and the effect of parallelism on the quality of the results. Our algorithms were executed on a Google&IBM cluster, which became available to us through an NSF-supported program. The cluster supports the Hadoop framework --- an open source implementation of MapReduce. Our results confirm the excellent scalability of the MapReduce framework in processing parallelizable problems.