High-performance scientific data management system

Authors:
Jaechun No;Rajeev Thakur;Alok Choudhary
Affiliations:
Department of Software Engineering, Sejona University, Seoul, Republic of Korea;Mathematics and Computer Science Division, Argonne National Laboratory, 9700 South Cass Avenue, Argonne (Lemont), IL;Department of Electrical and Computer Engineering, Northwestern University, Evanston, IL
Venue:
Journal of Parallel and Distributed Computing
Year:
2003

Citing 25
Cited 7

Design and Evaluation of primitives for Parallel I/O

Proceedings of the 1993 ACM/IEEE conference on Supercomputing
High-performance I/O for massively parallel computers: problems and prospects

Computer
Communication optimizations for irregular scientific computations on distributed memory architectures

Journal of Parallel and Distributed Computing - Special issue on scalability of parallel algorithms and architectures
Server-directed collective I/O in Panda

Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
PPFS: a high performance portable parallel file system

ICS '95 Proceedings of the 9th international conference on Supercomputing
The Vesta parallel file system

ACM Transactions on Computer Systems (TOCS)
Disk-directed I/O for MIMD multiprocessors

ACM Transactions on Computer Systems (TOCS)
An extended two-phase method for accessing sections of out-of-core arrays

Scientific Programming
The Galley parallel file system

Parallel Computing - Special double issue: parallel I/O
On implementing MPI-IO portably and with high performance

Proceedings of the sixth workshop on I/O in parallel and distributed systems
A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs

SIAM Journal on Scientific Computing
Querying very large multi-dimensional datasets in ADR

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Achieving high sustained performance in an unstructured mesh CFD application

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Integrating parallel file I/O and database support for high-performance scientific data management

Proceedings of the 2000 ACM/IEEE conference on Supercomputing
A case for using MPI's derived datatypes to improve I/O performance

SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Using MPI-2: Advanced Features of the Message Passing Interface

Using MPI-2: Advanced Features of the Message Passing Interface
A Scientific Data Management System for Irregular Applications

IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
An Experimental Evaluation of the Parallel I/O Systems of the IBM SP and Intel Paragon Using a Production Application

Proceedings of the Third International ACPC Conference with Special Emphasis on Parallel Databases and Parallel I/O: Parallel Computation
The SDSC storage resource broker

CASCON '98 Proceedings of the 1998 conference of the Centre for Advanced Studies on Collaborative research
Intelligent, adaptive file system policy selection

FRONTIERS '96 Proceedings of the 6th Symposium on the Frontiers of Massively Parallel Computation
PMPIO - A Portable Implementation of MPI-IO

FRONTIERS '96 Proceedings of the 6th Symposium on the Frontiers of Massively Parallel Computation
Multidimensional Indexing and Query Coordination for Tertiary Storage Management

SSDBM '99 Proceedings of the 11th International Conference on Scientific and Statistical Database Management
Globalized Newton-Krylov-Schwarz algorithms and software for parallel implicit CFD

Globalized Newton-Krylov-Schwarz algorithms and software for parallel implicit CFD
Graph partitioning for high-performance scientific simulations

Sourcebook of parallel computing
Scalability in the XFS file system

ATEC '96 Proceedings of the 1996 annual conference on USENIX Annual Technical Conference

Data replication techniques for data-intensive applications

ICCS'06 Proceedings of the 6th international conference on Computational Science - Volume Part IV
A case study in distributed locking protocol on linux clusters

ICCS'05 Proceedings of the 5th international conference on Computational Science - Volume Part I
GEDAS: a data management system for data grid environments

ICCS'05 Proceedings of the 5th international conference on Computational Science - Volume Part I
A locking protocol for a distributed computing environment

EGC'05 Proceedings of the 2005 European conference on Advances in Grid Computing
The replica management for wide-area distributed computing environments

NGITS'06 Proceedings of the 6th international conference on Next Generation Information Technologies and Systems
NAND flash memory-based hybrid file system for high I/O performance

Journal of Parallel and Distributed Computing
A replication software architecture(RSA) for supporting irregular applications on wide-area distributed computing environments

ISPA'07 Proceedings of the 5th international conference on Parallel and Distributed Processing and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many scientific applications have large I/O requirements, in terms of both the size of data and the number of files or data sets. Management, storage, efficient access, and analysis of this data present an extremely challenging task. Traditionally, two different solutions have been used for this task: file I/O or databases. File I/O can provide high performance but is tedious to use with large numbers of files and large and complex data sets. Databases can be convenient, flexible, and powerful but do not perform and scale well for parallel supercomputing applications. We have developed a software system, called Scientific Data Manager (SDM), that combines the good features of both file I/O and databases. SDM provides a high-level application programming interface to the user and, internally, uses a parallel file system to store real data (using various I/O optimizations available in MPI-IO) and a database to store application-related metadata. In order to support I/O in irregular applications, SDM makes extensive use of MPI-IO's noncontiguous collective I/O functions. Moreover, SDM uses the concept of a history file to optimize the cost of the index distribution using the metadata stored in database. We describe the design and implementation of SDM and present performance results with two regular applications, ASTRO3D and an Euler solver, and with two irregular applications, a CFD code called FUN3D and a Rayleigh-Taylor instability code.