Cooperative Client-Side File Caching for MPI Applications

  • Authors:
  • Wei-Keng Liao;Kenin Coloma;Alok Choudhary;Lee Ward

  • Affiliations:
  • ELECTRICAL ENGINEERING AND COMPUTER SCIENCE DEPARTMENT, NORTHWESTERN UNIVERSITY, EVANSTON, IL 60208, USA;ELECTRICAL ENGINEERING AND COMPUTER SCIENCE DEPARTMENT, NORTHWESTERN UNIVERSITY, EVANSTON, IL 60208, USA;ELECTRICAL ENGINEERING AND COMPUTER SCIENCE DEPARTMENT, NORTHWESTERN UNIVERSITY, EVANSTON, IL 60208, USA;SCALABLE COMPUTING SYSTEMS DEPARTMENT, SANDIA NATIONAL LABORATORIES, ALBUQUERQUE, NM 87185, USA

  • Venue:
  • International Journal of High Performance Computing Applications
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Client-side file caching is one of many I/O strategies adopted by today's parallel file systems that were initially designed for distributed systems. Most of these implementations treat each client independently because clients' computations are seldom related to each other in a distributed environment. However, it is misguided to apply the same assumption directly to high-performance computers where many parallel I/O operations come from a group of processes working within the same parallel application. Thus, file caching could perform more effectively if the scope of processes sharing the same file is known. In this paper, we propose a client-side file caching system for MPI applications that perform parallel I/O operations on shared files. In our design, an I/O thread is created and runs concurrently with the main thread in each MPI process. The MPI processes that collectively open a shared file use the I/O threads to cooperate with each other to handle file requests, cache page access, and coherence control. By bringing the caching subsystem closer to the applications as a user space library, it can be incorporated into an MPI I/O implementation to increase its portability. Performance evaluations using three I/O benchmarks demonstrate a significant improvement over traditional methods that use either byte-range file locking or rely on coherent I/O provided by the file system.