A MapReduce approach to Gi*(d) spatial statistic

Authors:
Yan Liu;Kaichao Wu;Shaowen Wang;Yanli Zhao;Qian Huang
Affiliations:
University of Illinois at Urbana-Champaign, Urbana, Illinois;Chinese Academy of Science (CAS), Beijing, China;University of Illinois at Urbana-Champaign, Urbana, Illinois;University of Illinois at Urbana-Champaign, Urbana, Illinois;Peking University, Beijing, China
Venue:
Proceedings of the ACM SIGSPATIAL International Workshop on High Performance and Distributed Geographic Information Systems
Year:
2010

Citing 12
Cited 2

Daytona and the fourth-generation language Cymbal

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Fundamentals of Parallel Processing

Fundamentals of Parallel Processing
A quadtree approach to domain decomposition for spatial interpolation in grid computing environments

Parallel Computing - Special issue: High performance computing with geographical data
MapReduce: simplified data processing on large clusters

Communications of the ACM - 50th anniversary issue: 1958 - 2008
Bigtable: A Distributed Storage System for Structured Data

ACM Transactions on Computer Systems (TOCS)
Grid computing of spatial statistics: using the TeraGrid for G i*(d) analysis

Concurrency and Computation: Practice & Experience - Grids and Geospatial Information Systems
Mars: a MapReduce framework on graphics processors

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
MRGIS: A MapReduce-Enabled High Performance Workflow System for GIS

ESCIENCE '08 Proceedings of the 2008 Fourth IEEE International Conference on eScience
A theoretical approach to the use of cyberinfrastructure in geographical analysis

International Journal of Geographical Information Science
Experiences on Processing Spatial Data with MapReduce

SSDBM 2009 Proceedings of the 21st International Conference on Scientific and Statistical Database Management
Spatial Queries Evaluation with MapReduce

GCC '09 Proceedings of the 2009 Eighth International Conference on Grid and Cooperative Computing
Phoenix rebirth: Scalable MapReduce on a large-scale shared-memory system

IISWC '09 Proceedings of the 2009 IEEE International Symposium on Workload Characterization (IISWC)

A parallel input-output system for resolving spatial data challenges: an agent-based model case study

Proceedings of the ACM SIGSPATIAL Second International Workshop on High Performance and Distributed Geographic Information Systems
CudaGIS: report on the design and realization of a massive data parallel GIS on GPUs

Proceedings of the Third ACM SIGSPATIAL International Workshop on GeoStreaming

Quantified Score

Hi-index	0.00

Visualization

Abstract

Managing and analyzing massive spatial datasets as supported by GIS and spatial analysis is becoming crucial to geospatial problem-solving and decision-making. MapReduce provides a data-centric computational model through which highly scalable spatial analysis computation can be achieved. However, it is challenging to leverage multi-dimensional spatial characteristics on the horizontally-partitioned and transparently managed MapReduce data system for improving the computational performance of spatial analysis. This paper tackles this challenge through the development of MapReduce-based computation of Gi*(d) -- a spatial statistic for detecting local clustering. Without exploiting spatial characteristics, Gi*(d) computation for a particular location requires pair-wise distance calculation for all points of a given dataset. A spatial locality-based storage and indexing strategy is developed to associate spatial locality with storage locality on MapReduce platform. Based on a spatial indexing method, unnecessary map tasks can be eliminated for a MapReduce job, thus significantly improving the overall computation performance. To leverage underlying parallelism on storage nodes, an application-level load balancing mechanism is developed to produce even loads among map tasks based on adaptive spatial domain decomposition. Experiments show the effectiveness of the developed storage and indexing strategy with different distance parameter settings. Significant reduction on execution time for all-point computation is observed through the use of the application-level load balancing mechanism.