Fault tolerant file models for MPI-IO parallel file systems

Authors:
A. Calderón;F. García-Carballeira;Florin Isaila;Rainer Keller;Alexander Schulz
Affiliations:
Computer Architecture Group, Computer Science Department, Universidad Carlos III de Madrid, Leganés, Madrid, Spain;Computer Architecture Group, Computer Science Department, Universidad Carlos III de Madrid, Leganés, Madrid, Spain;Computer Architecture Group, Computer Science Department, Universidad Carlos III de Madrid, Leganés, Madrid, Spain;High Performance Computing Center Stuttgart, Universität Stuttgart, Stuttgart, Germany;High Performance Computing Center Stuttgart, Universität Stuttgart, Stuttgart, Germany
Venue:
PVM/MPI'07 Proceedings of the 14th European conference on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Year:
2007

Citing 5
Cited 0

A case for redundant arrays of inexpensive disks (RAID)

SIGMOD '88 Proceedings of the 1988 ACM SIGMOD international conference on Management of data
On implementing MPI-IO portably and with high performance

Proceedings of the sixth workshop on I/O in parallel and distributed systems
Improved Read Performance in a Cost-Effective, Fault-Tolerant Parallel Virtual File System (CEFT-PVFS)

CCGRID '03 Proceedings of the 3st International Symposium on Cluster Computing and the Grid
RAID-x: A New Distributed Disk Array for I/O-Centric Cluster Computing

HPDC '00 Proceedings of the 9th IEEE International Symposium on High Performance Distributed Computing
A Fault Tolerant MPI-IO Implementation using the Expand Parallel File System

PDP '05 Proceedings of the 13th Euromicro Conference on Parallel, Distributed and Network-Based Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Parallelism in file systems is obtained by using several independent server nodes supporting one or more secondary storage devices. This approach increases the performance and scalability of the system, but a fault in one single node can make the whole system fail. In order to avoid this problem, data must be stored using some kind of redundant technique, so that it can be recovered in case of failure. Fault tolerance can be provided in I/O systems by using replication or RAID based schemes. However, most of the current systems apply the same technique of fault tolerant at disk or file system level. This paper1 describes how fault tolerance support can be used by MPI applications based on PVFS version 2 [1], a well-know parallel file system for clusters. This support can be applied to other parallel file systems with many benefits: fault tolerance at file level, flexible definition of new fault tolerance scheme, and dynamic reconfiguration of the fault tolerance policy.