Combining in-situ and in-transit processing to enable extreme-scale scientific analysis

Authors:
Janine C. Bennett;Hasan Abbasi;Peer-Timo Bremer;Ray Grout;Attila Gyulassy;Tong Jin;Scott Klasky;Hemanth Kolla;Manish Parashar;Valerio Pascucci;Philippe Pebay;David Thompson;Hongfeng Yu;Fan Zhang;Jacqueline Chen
Affiliations:
Sandia National Laboratories;Oakridge National Laboratory;Lawrence Livermore National Laboratory;National Renewable Energy Laboratory;University of Utah;Rutgers University;Oakridge National Laboratory;Sandia National Laboratories;Rutgers University;University of Utah;Kitware;Kitware;Sandia National Laboratories;Rutgers University;Sandia National Laboratories
Venue:
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Year:
2012

Citing 30
Cited 7

SCIRun: a scientific programming environment for computational steering

Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Computing contour trees in all dimensions

SODA '00 Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms
A distributed, parallel, interactive volume rendering package

VIS '94 Proceedings of the conference on Visualization '94
Parallel Computation of the Topology of Level Sets

Algorithmica
A Topological Approach to Simplification of Three-Dimensional Scalar Functions

IEEE Transactions on Visualization and Computer Graphics
Understanding the Structure of the Turbulent Mixing Layer in Hydrodynamic Instabilities

IEEE Transactions on Visualization and Computer Graphics
From mesh generation to scientific visualization: an end-to-end approach to parallel supercomputing

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Remote runtime steering of integrated terascale simulation and visualization

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Robust on-line computation of Reeb graphs: simplicity and speed

ACM SIGGRAPH 2007 papers
Topologically Clean Distance Fields

IEEE Transactions on Visualization and Computer Graphics
DART: a substrate for high speed asynchronous data IO

HPDC '08 Proceedings of the 17th international symposium on High performance distributed computing
A Practical Approach to Morse-Smale Complex Computation: Scalability and Generality

IEEE Transactions on Visualization and Computer Graphics
DataStager: scalable data staging services for petascale applications

Proceedings of the 18th ACM international symposium on High performance distributed computing
Analyzing and Tracking Burning Structures in Lean Premixed Hydrogen Flames

IEEE Transactions on Visualization and Computer Graphics
In Situ Visualization for Large-Scale Combustion Simulations

IEEE Computer Graphics and Applications
DataSpaces: an interaction and coordination framework for coupled simulation workflows

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Functional Partitioning to Optimize End-to-End Performance on Many-core Architectures

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Computing Contingency Statistics in Parallel: Design Trade-Offs and Limiting Cases

CLUSTER '10 Proceedings of the 2010 IEEE International Conference on Cluster Computing
Just in time: adding value to the IO pipelines of high performance applications with JITStaging

Proceedings of the 20th international symposium on High performance distributed computing
Interactive Exploration and Analysis of Large-Scale Simulations Using Topology-Based Data Segmentation

IEEE Transactions on Visualization and Computer Graphics
Design and Performance of a Scalable, Parallel Statistics Toolkit

IPDPSW '11 Proceedings of the 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and PhD Forum
Moving the Code to the Data - Dynamic Code Deployment Using ActiveSpaces

IPDPS '11 Proceedings of the 2011 IEEE International Parallel & Distributed Processing Symposium
ISABELA-QA: query-driven analytics with ISABELA-compressed extreme-scale scientific data

Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Feature-Based Statistical Analysis of Combustion Simulation Data

IEEE Transactions on Visualization and Computer Graphics
Adaptive Extraction and Quantification of Geophysical Vortices

IEEE Transactions on Visualization and Computer Graphics
A topological hierarchy for functions on triangulated surfaces

IEEE Transactions on Visualization and Computer Graphics
The Parallel Computation of Morse-Smale Complexes

IPDPS '12 Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium
Enabling In-situ Execution of Coupled Scientific Workflow on Multi-core Platform

IPDPS '12 Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium
Parallel in situ coupling of simulation with a fully featured visualization system

EG PGV'11 Proceedings of the 11th Eurographics conference on Parallel Graphics and Visualization

Scalable in situ scientific data encoding for analytical query processing

Proceedings of the 22nd international symposium on High-performance parallel and distributed computing
Hobbes: composition and virtualization as the foundations of an extreme-scale OS/R

Proceedings of the 3rd International Workshop on Runtime and Operating Systems for Supercomputers
A Maya use case: adaptable scientific workflows with ADIOS for general relativistic astrophysics

Proceedings of the Conference on Extreme Science and Engineering Discovery Environment: Gateway to Discovery
GoldRush: resource efficient in situ scientific data analytics using fine-grained interference aware execution

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Using cross-layer adaptations for dynamic data management in large scale coupled scientific workflows

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Exploring power behaviors and trade-offs of in-situ data analytics

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
SDAFT: a novel scalable data access framework for parallel BLAST

DISCS-2013 Proceedings of the 2013 International Workshop on Data-Intensive Scalable Computing Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

With the onset of extreme-scale computing, I/O constraints make it increasingly difficult for scientists to save a sufficient amount of raw simulation data to persistent storage. One potential solution is to change the data analysis pipeline from a post-process centric to a concurrent approach based on either in-situ or in-transit processing. In this context computations are considered in-situ if they utilize the primary compute resources, while in-transit processing refers to offloading computations to a set of secondary resources using asynchronous data transfers. In this paper we explore the design and implementation of three common analysis techniques typically performed on large-scale scientific simulations: topological analysis, descriptive statistics, and visualization. We summarize algorithmic developments, describe a resource scheduling system to coordinate the execution of various analysis workflows, and discuss our implementation using the DataSpaces and ADIOS frameworks that support efficient data movement between in-situ and in-transit computations. We demonstrate the efficiency of our lightweight, flexible framework by deploying it on the Jaguar XK6 to analyze data generated by S3D, a massively parallel turbulent combustion code. Our framework allows scientists dealing with the data deluge at extreme scale to perform analyses at increased temporal resolutions, mitigate I/O costs, and significantly improve the time to insight.