Data management for large-scale scientific computations in high performance distributed systems

  • Authors:
  • A. Choudhary;M. Kandemir;J. No;G. Memik;X. Shen;W. Liao;H. Nagesh;S. More;V. Taylor;R. Thakur;R. Stevens

  • Affiliations:
  • Center for Parallel and Distributed Computing, Department of Electrical and Computer Engineering, Northwestern University, Evanston, IL 60208, USA;Department of Computer Science and Engineering, The Pennsylvania State University, University Park, PA 16802, USA;Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, IL 60439, USA;Center for Parallel and Distributed Computing, Department of Electrical and Computer Engineering, Northwestern University, Evanston, IL 60208, USA;Center for Parallel and Distributed Computing, Department of Electrical and Computer Engineering, Northwestern University, Evanston, IL 60208, USA;Center for Parallel and Distributed Computing, Department of Electrical and Computer Engineering, Northwestern University, Evanston, IL 60208, USA;Center for Parallel and Distributed Computing, Department of Electrical and Computer Engineering, Northwestern University, Evanston, IL 60208, USA;Center for Parallel and Distributed Computing, Department of Electrical and Computer Engineering, Northwestern University, Evanston, IL 60208, USA;Center for Parallel and Distributed Computing, Department of Electrical and Computer Engineering, Northwestern University, Evanston, IL 60208, USA;Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, IL 60439, USA;Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, IL 60439, USA

  • Venue:
  • Cluster Computing
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

With the increasing number of scientific applications manipulating huge amounts of data, effective high-level data management is an increasingly important problem. Unfortunately, so far the solutions to the high-level data management problem either require deep understanding of specific storage architectures and file layouts (as in high-performance file storage systems) or produce unsatisfactory I/O performance in exchange for ease-of-use and portability (as in relational DBMSs). In this paper we present a novel application development environment which is built around an active meta-data management system (MDMS) to handle high-level data in an effective manner. The key components of our three-tiered architecture are user application, the MDMS, and a hierarchical storage system (HSS). Our environment overcomes the performance problems of pure database-oriented solutions, while maintaining their advantages in terms of ease-of-use and portability. The high levels of performance are achieved by the MDMS, with the aid of user-specified, performance-oriented directives. Our environment supports a simple, easy-to-use yet powerful user interface, leaving the task of choosing appropriate I/O techniques for the application at hand to the MDMS. We discuss the importance of an active MDMS and show how the three components of our environment, namely the application, the MDMS, and the HSS, fit together. We also report performance numbers from our ongoing implementation and illustrate that significant improvements are made possible without undue programming effort.