Toward automatic parallelization of spatial computation for computing clusters

  • Authors:
  • Baoqiang Yan;Philip J. Rhodes

  • Affiliations:
  • University of Mississippi, University, MS, USA;University of Mississippi, University, MS, USA

  • Venue:
  • HPDC '08 Proceedings of the 17th international symposium on High performance distributed computing
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

High performance parallel computing infrastructures, such as computing clusters, have recently become freely available for scientific researchers to solve problems of unprecedented scale through data parallelization. However scientists are not necessarily skilled in writing efficient parallel code, especially when dealing with spatial datasets. Two important performance issues involved are the heavy I/O costs and the communication overhead. To address this issue, we are developing an scheme that helps scientists realize I/O friendly and scalable data parallelization for spatial computation. Built upon our iteration aware spatial prefetching and caching techniques, this data parallelization scheme takes an explicit specification of data dependency, identifies the best feasible access patterns while applying some I/O efficiency rules and then wraps them in separate spatial data iterators for efficient cache loading and data partitioning respectively. This scheme prioritizes but reconciles the I/O costs in the different stages of a data intensive cluster application to achieve the overall best I/O performance while maintaining fair computational scalability.