High-performance scientific data management system

  • Authors:
  • Jaechun No;Rajeev Thakur;Alok Choudhary

  • Affiliations:
  • Department of Software Engineering, Sejona University, Seoul, Republic of Korea;Mathematics and Computer Science Division, Argonne National Laboratory, 9700 South Cass Avenue, Argonne (Lemont), IL;Department of Electrical and Computer Engineering, Northwestern University, Evanston, IL

  • Venue:
  • Journal of Parallel and Distributed Computing
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

Many scientific applications have large I/O requirements, in terms of both the size of data and the number of files or data sets. Management, storage, efficient access, and analysis of this data present an extremely challenging task. Traditionally, two different solutions have been used for this task: file I/O or databases. File I/O can provide high performance but is tedious to use with large numbers of files and large and complex data sets. Databases can be convenient, flexible, and powerful but do not perform and scale well for parallel supercomputing applications. We have developed a software system, called Scientific Data Manager (SDM), that combines the good features of both file I/O and databases. SDM provides a high-level application programming interface to the user and, internally, uses a parallel file system to store real data (using various I/O optimizations available in MPI-IO) and a database to store application-related metadata. In order to support I/O in irregular applications, SDM makes extensive use of MPI-IO's noncontiguous collective I/O functions. Moreover, SDM uses the concept of a history file to optimize the cost of the index distribution using the metadata stored in database. We describe the design and implementation of SDM and present performance results with two regular applications, ASTRO3D and an Euler solver, and with two irregular applications, a CFD code called FUN3D and a Rayleigh-Taylor instability code.