Fault tolerant file models for parallel file systems: introducing distribution patterns for every file

Authors:
A. Calderón;F. García-Carballeira;L. M. Sánchez;J. D. García;J. Fernandez
Affiliations:
Computer Architecture Group, Computer Science Department, Universidad Carlos III de Madrid, Leganés, Spain;Computer Architecture Group, Computer Science Department, Universidad Carlos III de Madrid, Leganés, Spain;Computer Architecture Group, Computer Science Department, Universidad Carlos III de Madrid, Leganés, Spain;Computer Architecture Group, Computer Science Department, Universidad Carlos III de Madrid, Leganés, Spain;Computer Architecture Group, Computer Science Department, Universidad Carlos III de Madrid, Leganés, Spain
Venue:
The Journal of Supercomputing
Year:
2009

Citing 13
Cited 0

Replication in the harp file system

SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
EVENODD: An Efficient Scheme for Tolerating Double Disk Failures in RAID Architectures

IEEE Transactions on Computers - Special issue on fault-tolerant computing
Serverless network file systems

SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Tolerating multiple failures in RAID architectures with optimal storage and uniform declustering

Proceedings of the 24th annual international symposium on Computer architecture
On implementing MPI-IO portably and with high performance

Proceedings of the sixth workshop on I/O in parallel and distributed systems
Distributed RAID - A New Multiple Copy Algorithm

Proceedings of the Sixth International Conference on Data Engineering
An Implementation of MPI-IO on Expand: A Parallel File System Based on NFS Servers

Proceedings of the 9th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
A Tutorial on Reed-Solomon Coding for Fault-Tolerance in RAID-likeSystems

A Tutorial on Reed-Solomon Coding for Fault-Tolerance in RAID-likeSystems
SWIFT/RAID: A DISTRIBUTED RAID SYSTEM

SWIFT/RAID: A DISTRIBUTED RAID SYSTEM
SWIFT: USING DISTRIBUTED DISK STRIPING TO PROVIDE HIGH I/O DATA RATES

SWIFT: USING DISTRIBUTED DISK STRIPING TO PROVIDE HIGH I/O DATA RATES
An XOR Based Reed-Solomon Algorithm for Advanced RAID Systems

DFT '04 Proceedings of the Defect and Fault Tolerance in VLSI Systems, 19th IEEE International Symposium
A Fault Tolerant MPI-IO Implementation using the Expand Parallel File System

PDP '05 Proceedings of the 13th Euromicro Conference on Parallel, Distributed and Network-Based Processing
PVFS: a parallel file system for linux clusters

ALS'00 Proceedings of the 4th annual Linux Showcase & Conference - Volume 4

Quantified Score

Hi-index	0.00

Visualization

Abstract

Parallelism in file systems is obtained by using several independent server nodes supporting one or more secondary storage devices. This approach increases the performance and scalability of the system, but a fault in one single node can stop the whole system. To avoid this problem, data must be stored using some kind of redundant technique, so any data stored in a faulty element can be recovered. Fault tolerance can be provided in I/O systems by using replication or RAID based schemes. However, most of the current systems apply the same technique for all files in the system.This paper describes the fault tolerance support provided by Expand, a parallel file system based on standard servers. This support can be applied to other parallel file systems with many benefices: fault tolerance at file level, flexible definition of fault tolerance scheme to be used, possibility to change the fault tolerant support used for a file, etc.