A Distributed Multi-Storage Resource Architecture and I/O Performance Prediction for Scientific Computing

  • Authors:
  • X. Shen;A. Choudhary;C. Matarazzo;P. Sinha

  • Affiliations:
  • Center for Parallel and Distributed Computing, Department of Electrical and Computer Engineering, Northwestern University, Evanston, IL 60208, USA;Center for Parallel and Distributed Computing, Department of Electrical and Computer Engineering, Northwestern University, Evanston, IL 60208, USA;Lawrence Livermore National Laboratory, 7000 East Avenue, Livermore, CA 94550, USA;Lawrence Livermore National Laboratory, 7000 East Avenue, Livermore, CA 94550, USA

  • Venue:
  • Cluster Computing
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

I/O intensive applications have posed great challenges to computational scientists. A major problem of these applications is that users have to sacrifice performance requirements in order to satisfy storage capacity requirements in a conventional computing environment. Further performance improvement is impeded by the physical nature of these storage media even when state-of-the-art I/O optimizations are employed.In this paper, we present a distributed multi-storage resource architecture, which can satisfy both performance and capacity requirements by employing multiple storage resources. Compared to a traditional single storage resource architecture, our architecture provides a more flexible and reliable computing environment. This architecture can bring new opportunities for high performance computing as well as inherit state-of-the-art I/O optimization approaches that have already been developed. It provides application users with high-performance storage access even when they do not have the availability of a single large local storage archive at their disposal. We also develop an Application Programming Interface (API) that provides transparent management and access to various storage resources in our computing environment. Since I/O usually dominates the performance in I/O intensive applications, we establish an I/O performance prediction mechanism which consists of a performance database and a prediction algorithm to help users better evaluate and schedule their applications. A tool is also developed to help users automatically generate performance data stored in databases. The experiments show that our multi-storage resource architecture is a promising platform for high performance distributed computing.