Driving scientific applications by data in distributed environments

Authors:
Joel Saltz;Umit Catalyurek;Tahsin Kurc;Mike Gray;Shannon Hastings;Steve Langella;Sivaramakrishnan Narayanan;Ryan Martino;Steven Bryant;Malgorzata Peszynska;Mary Wheeler;Alan Sussman;Michael Beynon;Christian Hansen;Don Stredney;Dennis Sessanna
Affiliations:
Department of Biomedical Informatics, The Ohio State University;Department of Biomedical Informatics, The Ohio State University;Department of Biomedical Informatics, The Ohio State University;Department of Biomedical Informatics, The Ohio State University;Department of Biomedical Informatics, The Ohio State University;Department of Biomedical Informatics, The Ohio State University;Department of Biomedical Informatics, The Ohio State University;Center for Subsurface Modeling, The University of Texas at Austin;Center for Subsurface Modeling, The University of Texas at Austin;Center for Subsurface Modeling, The University of Texas at Austin;Center for Subsurface Modeling, The University of Texas at Austin;Department of Computer Science, University of Maryland;Department of Computer Science, University of Maryland;Department of Computer Science, University of Maryland;Interface Laboratory, The Ohio Supercomputer Center;Interface Laboratory, The Ohio Supercomputer Center
Venue:
ICCS'03 Proceedings of the 2003 international conference on Computational science
Year:
2003

Citing 10
Cited 9

The visualization toolkit (2nd ed.): an object-oriented approach to 3D graphics

The visualization toolkit (2nd ed.): an object-oriented approach to 3D graphics
PARSIMONY: An infrastructure for parallel multidimensional analysis and data mining

Journal of Parallel and Distributed Computing - Special issue on high-performance data mining
Fundamentals of Numerical Reservoir Simulation

Fundamentals of Numerical Reservoir Simulation
Distributed processing of very large datasets with DataCutter

Parallel Computing - Clusters and computational grids for scientific computing
Processing large-scale multi-dimensional data in parallel and distributed environments

Parallel Computing - Parallel data-intensive algorithms and applications
Visualization of Large Data Sets with the Active Data Repository

IEEE Computer Graphics and Applications
Armada: A Parallel File System for Computational Grids

CCGRID '01 Proceedings of the 1st International Symposium on Cluster Computing and the Grid
dQUOB: Managing Large Data Flows Using Dynamic Embedded Queries

HPDC '00 Proceedings of the 9th IEEE International Symposium on High Performance Distributed Computing
ACDS: Adapting Computational Data Streams for High Performance

IPDPS '00 Proceedings of the 14th International Symposium on Parallel and Distributed Processing
Optimizing Retrieval and Processing of Multi-Dimensional Scientific Datasets

IPDPS '00 Proceedings of the 14th International Symposium on Parallel and Distributed Processing

Applying Database Support for Large Scale Data Driven Science in Distributed Environments

GRID '03 Proceedings of the 4th International Workshop on Grid Computing
A Services Oriented Framework for Next Generation Data Analysis Centers

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 10 - Volume 11
Application of grid-enabled technologies for solving optimization problems in data-driven reservoir studies

Future Generation Computer Systems
Data Centric Transformations on Non-Integer Iteration Spaces

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Cumulvs: Interacting with High-Performance Scientific Simulations, for Visualization, Steering and Fault Tolerance

International Journal of High Performance Computing Applications
Evaluating I/O characteristics and methods for storing structured scientific data

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Servicing seismic and oil reservoir simulation data through grid data services

DMG 2005 Proceedings of the First VLDB conference on Data Management in Grids
Supporting SQL-3 aggregations on grid-based data repositories

LCPC'04 Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing
Supporting XML based high-level abstractions on HDF5 datasets: a case study in automatic data virtualization

LCPC'04 Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing

Quantified Score

Hi-index	0.01

Visualization

Abstract

Traditional simulation-based applications for exploring a parameter space to understand a physical phenomenon or to optimize a design are rapidly overwhelmed by data volume when large numbers of simulations of different parameters are carried out. Optimizing reservoir management through simulation-based studies, in which large numbers of realizations are sought using detailed geologic descriptions, is an example of such applications. In this paper, we describe a software architecture to facilitate large scale simulation studies, involving ensembles of long-running simulations and analysis of vast volumes of output data. This architecture is built on top of two frameworks we have developed: IPARS and DataCutter. These frameworks make it possible to implement tools and applications to run large-scale simulatios, and generate and investigate terabyte-scale datasets efficiently.