Experiences on Processing Spatial Data with MapReduce

Authors:
Ariel Cary;Zhengguo Sun;Vagelis Hristidis;Naphtali Rishe
Affiliations:
School of Computing and Information Sciences, Florida International University, Miami, FL 33199;School of Computing and Information Sciences, Florida International University, Miami, FL 33199;School of Computing and Information Sciences, Florida International University, Miami, FL 33199;School of Computing and Information Sciences, Florida International University, Miami, FL 33199
Venue:
SSDBM 2009 Proceedings of the 21st International Conference on Scientific and Statistical Database Management
Year:
2009

Citing 13
Cited 18

The R*-tree: an efficient and robust access method for points and rectangles

SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
Space-filling curves and their use in the design of geometric data structures

Theoretical Computer Science - Special issue: Latin American theoretical informatics
Using MPI (2nd ed.): portable parallel programming with the message-passing interface

Using MPI (2nd ed.): portable parallel programming with the message-passing interface
R-trees: a dynamic index structure for spatial searching

SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
Using Space-Filling Curves for Multi-dimensional Indexing

BNCOD 17 Proceedings of the 17th British National Conferenc on Databases: Advances in Databases
Master-Client R-Trees: A New Parallel R-Tree Architecture

SSDBM '99 Proceedings of the 11th International Conference on Scientific and Statistical Database Management
The Google file system

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Parallel bulk-loading of spatial data

Parallel Computing - Special issue: High performance computing with geographical data
Map-reduce-merge: simplified relational data processing on large clusters

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Automatic alignment of large-scale aerial rasters to road-maps

Proceedings of the 15th annual ACM international symposium on Advances in geographic information systems
Pig latin: a not-so-foreign language for data processing

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Materialized community ground models for large-scale earthquake simulation

Proceedings of the 2008 ACM/IEEE conference on Supercomputing

Parallel processing of data from very large-scale wireless sensor networks

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Towards personal high-performance geospatial computing (HPC-G): perspectives and a case study

Proceedings of the ACM SIGSPATIAL International Workshop on High Performance and Distributed Geographic Information Systems
A MapReduce approach to Gi*(d) spatial statistic

Proceedings of the ACM SIGSPATIAL International Workshop on High Performance and Distributed Geographic Information Systems
Spatial scene similarity assessment on Hadoop

Proceedings of the ACM SIGSPATIAL International Workshop on High Performance and Distributed Geographic Information Systems
Scalable and Distributed Processing of Scientific XML Data

GRID '11 Proceedings of the 2011 IEEE/ACM 12th International Conference on Grid Computing
High performance spatial query processing for large scale scientific data

PhD '12 Proceedings of the on SIGMOD/PODS 2012 PhD Symposium
ComMapReduce: an improvement of mapreduce with lightweight communication mechanisms

DASFAA'12 Proceedings of the 17th international conference on Database Systems for Advanced Applications - Volume Part II
Multimedia Applications and Security in MapReduce: Opportunities and Challenges

Concurrency and Computation: Practice & Experience
Collaborative geospatial feature search

Proceedings of the 20th International Conference on Advances in Geographic Information Systems
Towards building a high performance spatial query system for large scale medical imaging data

Proceedings of the 20th International Conference on Advances in Geographic Information Systems
MobiS: a distributed paradigm of mobile sensor data analytics for evaluating environmental exposures

Proceedings of the First ACM SIGSPATIAL International Workshop on Mobile Geographic Information Systems
Computational Engineering in the Cloud: Benefits and Challenges

Journal of Organizational and End User Computing
Sort-based parallel loading of R-trees

Proceedings of the 1st ACM SIGSPATIAL International Workshop on Analytics for Big Geospatial Data
The family of mapreduce and large-scale data processing systems

ACM Computing Surveys (CSUR)
Efficient distributed multi-dimensional index for big data management

WAIM'13 Proceedings of the 14th international conference on Web-Age Information Management
CG_Hadoop: computational geometry in MapReduce

Proceedings of the 21st ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems
Hadoop GIS: a high performance spatial data warehousing system over mapreduce

Proceedings of the VLDB Endowment
Balancing reducer workload for skewed data using sampling-based partitioning

Computers and Electrical Engineering

Quantified Score

Hi-index	0.03

Visualization

Abstract

The amount of information in spatial databases is growing as more data is made available. Spatial databases mainly store two types of data: raster data (satellite/aerial digital images), and vector data (points, lines, polygons). The complexity and nature of spatial databases makes them ideal for applying parallel processing. MapReduce is an emerging massively parallel computing model, proposed by Google. In this work, we present our experiences in applying the MapReduce model to solve two important spatial problems: (a) bulk-construction of R-Trees and (b) aerial image quality computation, which involve vector and raster data, respectively. We present our results on the scalability of MapReduce, and the effect of parallelism on the quality of the results. Our algorithms were executed on a Google&IBM cluster, which became available to us through an NSF-supported program. The cluster supports the Hadoop framework --- an open source implementation of MapReduce. Our results confirm the excellent scalability of the MapReduce framework in processing parallelizable problems.