Toward automatic parallelization of spatial computation for computing clusters

Authors:
Baoqiang Yan;Philip J. Rhodes
Affiliations:
University of Mississippi, University, MS, USA;University of Mississippi, University, MS, USA
Venue:
HPDC '08 Proceedings of the 17th international symposium on High performance distributed computing
Year:
2008

Citing 16
Cited 2

Study of scalable declustering algorithms for parallel grid files

Study of scalable declustering algorithms for parallel grid files
Space-filling curves and their use in the design of geometric data structures

Theoretical Computer Science - Special issue: Latin American theoretical informatics
(Almost) optimal parallel block access to range queries

PODS '00 Proceedings of the nineteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
MOCHA: a self-extensible database middleware system for distributed data sources

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Optimizing noncontiguous accesses in MPI – IO

Parallel Computing
Processing large-scale multi-dimensional data in parallel and distributed environments

Parallel Computing - Parallel data-intensive algorithms and applications
Efficient Organization of Large Multidimensional Arrays

Proceedings of the Tenth International Conference on Data Engineering
Latin Cubes and Parallel Array Access

Proceedings of the 8th International Symposium on Parallel Processing
Array Distribution in Data-Parallel Programs

LCPC '94 Proceedings of the 7th International Workshop on Languages and Compilers for Parallel Computing
Hilbert R-tree: An Improved R-tree using Fractals

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Armada: A Parallel File System for Computational Grids

CCGRID '01 Proceedings of the 1st International Symposium on Cluster Computing and the Grid
Compiling Tiled Iteration Spaces for Clusters

CLUSTER '02 Proceedings of the IEEE International Conference on Cluster Computing
Parallel netCDF: A High-Performance Scientific I/O Interface

Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Iteration aware prefetching for large multidimensional datasets

SSDBM'2005 Proceedings of the 17th international conference on Scientific and statistical database management
Efficient parallel processing of range queries through replicated declustering

Distributed and Parallel Databases
UDT: UDP-based data transfer for high-speed wide area networks

Computer Networks: The International Journal of Computer and Telecommunications Networking

Granite: an effective tool to explore the concepts of high-performance data intensive scientific computation in java

Journal of Computing Sciences in Colleges
Six degrees of scientific data: reading patterns for extreme scale science IO

Proceedings of the 20th international symposium on High performance distributed computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

High performance parallel computing infrastructures, such as computing clusters, have recently become freely available for scientific researchers to solve problems of unprecedented scale through data parallelization. However scientists are not necessarily skilled in writing efficient parallel code, especially when dealing with spatial datasets. Two important performance issues involved are the heavy I/O costs and the communication overhead. To address this issue, we are developing an scheme that helps scientists realize I/O friendly and scalable data parallelization for spatial computation. Built upon our iteration aware spatial prefetching and caching techniques, this data parallelization scheme takes an explicit specification of data dependency, identifies the best feasible access patterns while applying some I/O efficiency rules and then wraps them in separate spatial data iterators for efficient cache loading and data partitioning respectively. This scheme prioritizes but reconciles the I/O costs in the different stages of a data intensive cluster application to achieve the overall best I/O performance while maintaining fair computational scalability.