Just in time: adding value to the IO pipelines of high performance applications with JITStaging

Authors:
Hasan Abbasi;Greg Eisenhauer;Matthew Wolf;Karsten Schwan;Scott Klasky
Affiliations:
Georgia Institute of Technology, Atlanta, GA, USA;Georgia Institute of Technology, Atlanta, GA, USA;Georgia Institute of Technology, Atlanta, GA, USA;Georgia Institute of Technology, Atlanta, GA, USA;Oak Ridge National Laboratory, Oak Ridge, TN, USA
Venue:
Proceedings of the 20th international symposium on High performance distributed computing
Year:
2011

Citing 27
Cited 10

Fast parallel algorithms for short-range molecular dynamics

Journal of Computational Physics
Efficient wire formats for high performance computing

Proceedings of the 2000 ACM/IEEE conference on Supercomputing
The Paradyn Parallel Performance Measurement Tool

Computer
MRNet: A Software-Based Multicast/Reduction Network for Scalable Tools

Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Grid -Based Parallel Data Streaming implemented for the Gyrokinetic Toroidal Code

Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Leading Computational Methods on Scalar and Vector HEC Platforms

SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
XChange: coupling parallel applications in a dynamic environment

CLUSTER '04 Proceedings of the 2004 IEEE International Conference on Cluster Computing
Scientific workflow management and the Kepler system: Research Articles

Concurrency and Computation: Practice & Experience - Workflow in Grid Systems
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Kernel plugins: when a VM is too much

VM'04 Proceedings of the 3rd conference on Virtual Machine Research And Technology Symposium - Volume 3
ZOID: I/O-forwarding infrastructure for petascale architectures

Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Meteor: a middleware infrastructure for content-based decoupled interactions in pervasive grid environments

Concurrency and Computation: Practice & Experience
Dynamically adapting file domain partitioning methods for collective I/O based on underlying parallel file system locking protocols

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Scaling parallel I/O performance through I/O delegate and caching system

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
MapReduce for Data Intensive Scientific Analyses

ESCIENCE '08 Proceedings of the 2008 Fourth IEEE International Conference on eScience
GViM: GPU-accelerated virtual machines

Proceedings of the 3rd ACM Workshop on System-level Virtualization for High Performance Computing
LIVE data workspace: A flexible, dynamic and extensible platform for petascale applications

CLUSTER '07 Proceedings of the 2007 IEEE International Conference on Cluster Computing
DataStager: scalable data staging services for petascale applications

Proceedings of the 18th ACM international symposium on High performance distributed computing
Adaptable, metadata rich IO methods for portable high performance IO

IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
Exploiting Latent I/O Asynchrony in Petascale Science Applications

ICPPW '09 Proceedings of the 2009 International Conference on Parallel Processing Workshops
Improving compiler-runtime separation with XIR

Proceedings of the 6th ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
Monalytics: online monitoring and analytics for managing large scale data centers

Proceedings of the 7th international conference on Autonomic computing
DataSpaces: an interaction and coordination framework for coupled simulation workflows

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Accelerating parallel analysis of scientific simulation data via Zazen

FAST'10 Proceedings of the 8th USENIX conference on File and storage technologies
Managing Variability in the IO Performance of Petascale Storage Systems

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
MPI-IO/L: efficient remote I/O for MPI-IO via logistical networking

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
GPFS: a shared-disk file system for large computing clusters

FAST'02 Proceedings of the 1st USENIX conference on File and storage technologies

High end scientific codes with computational I/O pipelines: improving their end-to-end performance

Proceedings of the 2nd international workshop on Petascal data analytics: challenges and opportunities
Towards scalable I/O architecture for exascale systems

Proceedings of the 2011 ACM international workshop on Many task computing on grids and supercomputers
In-situ I/O processing: a case for location flexibility

Proceedings of the sixth workshop on Parallel Data Storage
ISOBAR hybrid compression-I/O interleaving for large-scale parallel I/O optimization

Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
Combining in-situ and in-transit processing to enable extreme-scale scientific analysis

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Scalable in situ scientific data encoding for analytical query processing

Proceedings of the 22nd international symposium on High-performance parallel and distributed computing
Memory-conscious collective I/O for extreme scale HPC systems

Proceedings of the 3rd International Workshop on Runtime and Operating Systems for Supercomputers
GoldRush: resource efficient in situ scientific data analytics using fine-grained interference aware execution

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Exploring power behaviors and trade-offs of in-situ data analytics

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
On causes of GridFTP transfer throughput variance

NDM '13 Proceedings of the Third International Workshop on Network-Aware Data Management

Quantified Score

Hi-index	0.00

Visualization

Abstract

Large scale applications are generating a tsunami of data, with understanding driven by finding information hidden within this data. The ever-increasing sizes of output, however, are making it difficult for science users to inspect the data generated by their applications, understand its important properties, and/or organize it for subsequent analysis and visualization. This paper presents JITStager, a software infrastructure with which end users can dynamically customize and thus, add value to the output pipelines of their HEC applications. JITStager is able to customize data at scale, by leveraging the computational power of both compute nodes and of additional `data staging' nodes allocated by end users. Using existing, componentized I/O interfaces to decouple the compile-time specification of the program and the run-time customization of the data pipeline, JITStager employs efficient runtime methods for binary code generation and data movement to create custom pipelines for applications' output processes that provide end users with improved insights into the data being produced, without burdening the application's computational performance and without impeding output performance. This paper describes the JITStager architecture, evaluates its performance, and demonstrates the advantages derived from its use with representative HPC applications.