DataStager: scalable data staging services for petascale applications

Authors:
Hasan Abbasi;Matthew Wolf;Greg Eisenhauer;Scott Klasky;Karsten Schwan;Fang Zheng
Affiliations:
College of Computing, Georgia Institute of Technology, Atlanta, USA;College of Computing, Georgia Institute of Technology, Atlanta, USA;College of Computing, Georgia Institute of Technology, Atlanta, USA;Oak Ridge National Laboratory, Oak Ridge, USA;College of Computing, Georgia Institute of Technology, Atlanta, USA;College of Computing, Georgia Institute of Technology, Atlanta, USA
Venue:
Cluster Computing
Year:
2010

Citing 23
Cited 2

A stop-and-go queueing framework for congestion management

SIGCOMM '90 Proceedings of the ACM symposium on Communications architectures & protocols
Input/output behavior of supercomputing applications

Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Server-directed collective I/O in Panda

Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Disk-directed I/O for MIMD multiprocessors

ACM Transactions on Computer Systems (TOCS)
Efficient wire formats for high performance computing

Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Adaptive System Sensitive Partitioning of AMR Applications on Heterogeneous Clusters

Cluster Computing
Reducing Hot-Spot Contention in Shared-Memory Multiprocessor Systems

IEEE Concurrency
GPFS: A Shared-Disk File System for Large Computing Clusters

FAST '02 Proceedings of the Conference on File and Storage Technologies
Portals 3.0: Protocol Building Blocks for Low Overhead Communication

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
SmartPointers: personalized scientific data portals in your hand

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Event Services for High Performance Computing

HPDC '00 Proceedings of the 9th IEEE International Symposium on High Performance Distributed Computing
A High-Performance Cluster Storage Server

HPDC '02 Proceedings of the 11th IEEE International Symposium on High Performance Distributed Computing
Alleviating Memory Contention in Matrix Computations on Large-Scale Shared-Memory Multiprocessors

Alleviating Memory Contention in Matrix Computations on Large-Scale Shared-Memory Multiprocessors
IQ-services: network-aware middleware for interactive large-data applications

MGC '04 Proceedings of the 2nd workshop on Middleware for grid computing
Leading Computational Methods on Scalar and Vector HEC Platforms

SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Parallel genomic sequence-searching on an ad-hoc grid: experiences, lessons learned, and implications

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
PVFS: a parallel file system for linux clusters

ALS'00 Proceedings of the 4th annual Linux Showcase & Conference - Volume 4
Investigation of leading HPC I/O performance using a scientific-application derived benchmark

Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Flexible IO and integration for scientific codes through the adaptable IO system (ADIOS)

CLADE '08 Proceedings of the 6th international workshop on Challenges of large applications in distributed environments
Scaling parallel I/O performance through I/O delegate and caching system

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Comparative evaluation of overlap strategies with study of I/O overlap in MPI-IO

ACM SIGOPS Operating Systems Review
LIVE data workspace: A flexible, dynamic and extensible platform for petascale applications

CLUSTER '07 Proceedings of the 2007 IEEE International Conference on Cluster Computing
Program phase detection and exploitation

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing

Towards scalable I/O architecture for exascale systems

Proceedings of the 2011 ACM international workshop on Many task computing on grids and supercomputers
Feature-based analysis of large-scale spatio-temporal sensor data on hybrid architectures

International Journal of High Performance Computing Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Known challenges for petascale machines are that (1) the costs of I/O for high performance applications can be substantial, especially for output tasks like checkpointing, and (2) noise from I/O actions can inject undesirable delays into the runtimes of such codes on individual compute nodes. This paper introduces the flexible `DataStager' framework for data staging and alternative services within that jointly address (1) and (2). Data staging services moving output data from compute nodes to staging or I/O nodes prior to storage are used to reduce I/O overheads on applications' total processing times, and explicit management of data staging offers reduced perturbation when extracting output data from a petascale machine's compute partition. Experimental evaluations of DataStager on the Cray XT machine at Oak Ridge National Laboratory establish both the necessity of intelligent data staging and the high performance of our approach, using the GTC fusion modeling code and benchmarks running on 1000+ processors.