Design, implementation and performance evaluation of a cost-effective, fault-tolerant parallel virtual file system

Authors:
Yifeng Zhu;Hong Jiang;Xiao Qin;Dan Feng;David R. Swanson
Affiliations:
University of Nebraska, Lincoln, NE;University of Nebraska, Lincoln, NE;University of Nebraska, Lincoln, NE;Huazhong University of Science and Technology, Wuhan, China;University of Nebraska, Lincoln, NE
Venue:
SNAPI '03 Proceedings of the international workshop on Storage network architecture and parallel I/Os
Year:
2003

Citing 23
Cited 8

A case for redundant arrays of inexpensive disks (RAID)

SIGMOD '88 Proceedings of the 1988 ACM SIGMOD international conference on Management of data
Efficient dispersal of information for security, load balancing, and fault tolerance

Journal of the ACM (JACM)
Redundant disk arrays: reliable, parallel secondary storage

Redundant disk arrays: reliable, parallel secondary storage
RAID: high-performance, reliable secondary storage

ACM Computing Surveys (CSUR)
The Zebra striped network file system

ACM Transactions on Computer Systems (TOCS)
Performance analysis of MD5

SIGCOMM '95 Proceedings of the conference on Applications, technologies, architectures, and protocols for computer communication
Petal: distributed virtual disks

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
A tutorial on Reed-Solomon coding for fault-tolerance in RAID-like systems

Software—Practice & Experience
Modeling and Performance Comparison of Reliability Strategies for Distributed Video Servers

IEEE Transactions on Parallel and Distributed Systems
The notions of consistency and predicate locks in a database system

Communications of the ACM
Reliability and performance of hierarchical RAID with multiple controllers

Proceedings of the twentieth annual ACM symposium on Principles of distributed computing
A case study in application I/O on Linux clusters

Proceedings of the 2001 ACM/IEEE conference on Supercomputing
Dynamic file-access characteristics of a production parallel scientific workload

Proceedings of the 1994 ACM/IEEE conference on Supercomputing
GPFS: A Shared-Disk File System for Large Computing Clusters

FAST '02 Proceedings of the Conference on File and Storage Technologies
Chained Declustering: A New Availability Strategy for Multiprocessor Database Machines

Proceedings of the Sixth International Conference on Data Engineering
Improved Read Performance in a Cost-Effective, Fault-Tolerant Parallel Virtual File System (CEFT-PVFS)

CCGRID '03 Proceedings of the 3st International Symposium on Cluster Computing and the Grid
Implementation and performance of a parallel file system for high performance distributed applications

HPDC '96 Proceedings of the 5th IEEE International Symposium on High Performance Distributed Computing
Enhancing Write I/O Performance of Disk Array RM2 Tolerating Double Disk Failures

ICPP '02 Proceedings of the 2002 International Conference on Parallel Processing
MPI-IO on a Parallel File System for Cluster of Workstations

IWCC '99 Proceedings of the 1st IEEE Computer Society International Workshop on Cluster Computing
The Cluster File System: Integration of High Performance Communication and I/O in Clusters

CCGRID '02 Proceedings of the 2nd IEEE/ACM International Symposium on Cluster Computing and the Grid
Awarded Best Student Paper! - Pond: The OceanStore Prototype

FAST '03 Proceedings of the 2nd USENIX Conference on File and Storage Technologies
PVFS: a parallel file system for linux clusters

ALS'00 Proceedings of the 4th annual Linux Showcase & Conference - Volume 4
A fault-tolerant continuous media disk array under arbitrary-rate search

IEEE Transactions on Consumer Electronics

Distributed Storage Layout Schemes

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 16 - Volume 17
I/O performance of an RAID-10 style parallel file system

Journal of Computer Science and Technology
CEFT: A cost-effective, fault-tolerant parallel virtual file system

Journal of Parallel and Distributed Computing
A case study of parallel I/O for biological sequence search on Linux clusters

International Journal of High Performance Computing and Networking
Construction of efficient or-based deletion-tolerant coding schemes

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
DPCT: distributed parity cache table for redundant parallel file system

HPCC'06 Proceedings of the Second international conference on High Performance Computing and Communications
A high availability mechanism for parallel file system

APPT'05 Proceedings of the 6th international conference on Advanced Parallel Processing Technologies
Modularized redundant parallel virtual file system

ACSAC'05 Proceedings of the 10th Asia-Pacific conference on Advances in Computer Systems Architecture

Quantified Score

Hi-index	0.00

Visualization

Abstract

Fault tolerance is one of the most important issues for parallel file systems. This paper presents the design, implementation and performance evaluation of a cost-effective, fault-tolerant parallel virtual file system (CEFT-PVFS) that provides parallel I/O service without requiring any additional hardware by utilizing existing commodity disks on cluster nodes and incorporates fault tolerance in the form of disk mirroring. While mirroring is a straightforward idea, we have implemented this open source system and conducted extensive experiments to evaluate the feasibility, efficiency and scalability of this fault tolerant approach on one of the current largest clusters, where the issues of data consistency and recovery are also investigated. Four mirroring protocols are proposed, reflecting whether the fault-tolerant operations are client driven or server driven; synchronous or asynchronous. Their relative merits are assessed by comparing their write performances, measured in the real systems, and their reliability and availability measures, obtained through analytical modeling. The results indicate that, in cluster environments, mirroring can improve the reliability by a factor of over 40 (4000%) while sacrificing the peak write performance by 33--58% when both systems are of identical sizes (i.e., counting the 50% mirroring disks in the mirrored system). In addition, protocols with higher peak write performance are less reliable than those with lower peak write performance, with the latter achieving a higher reliability and availability at the expense of some write bandwidth. A hybrid protocol is proposed to optimize this tradeoff.