A high availability mechanism for parallel file system

Authors:
Hu Zhang;Weiguo Wu;Xiaoshe Dong;Depei Qian
Affiliations:
Department of Computer Science, Xi’an Jiaotong Univ., Xi’an, Shaanxi, China;Department of Computer Science, Xi’an Jiaotong Univ., Xi’an, Shaanxi, China;Department of Computer Science, Xi’an Jiaotong Univ., Xi’an, Shaanxi, China;Department of Computer Science, Xi’an Jiaotong Univ., Xi’an, Shaanxi, China
Venue:
APPT'05 Proceedings of the 6th international conference on Advanced Parallel Processing Technologies
Year:
2005

Citing 10
Cited 0

Measurements of a distributed file system

SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
File-Access Characteristics of Parallel Scientific Workloads

IEEE Transactions on Parallel and Distributed Systems
QoS and traffic management in IP and ATM networks

QoS and traffic management in IP and ATM networks
Reliability and performance of hierarchical RAID with multiple controllers

Proceedings of the twentieth annual ACM symposium on Principles of distributed computing
zFS " A Scalable Distributed File System Using Object Disks

MSS '03 Proceedings of the 20 th IEEE/11 th NASA Goddard Conference on Mass Storage Systems and Technologies (MSS'03)
Reliability Mechanisms for Very Large Storage Systems

MSS '03 Proceedings of the 20 th IEEE/11 th NASA Goddard Conference on Mass Storage Systems and Technologies (MSS'03)
Efficient Metadata Management in Large Distributed Storage Systems

MSS '03 Proceedings of the 20 th IEEE/11 th NASA Goddard Conference on Mass Storage Systems and Technologies (MSS'03)
Design, implementation and performance evaluation of a cost-effective, fault-tolerant parallel virtual file system

SNAPI '03 Proceedings of the international workshop on Storage network architecture and parallel I/Os
PVFS: a parallel file system for linux clusters

ALS'00 Proceedings of the 4th annual Linux Showcase & Conference - Volume 4
GPFS: a shared-disk file system for large computing clusters

FAST'02 Proceedings of the 1st USENIX conference on File and storage technologies

Quantified Score

Hi-index	0.00

Visualization

Abstract

Parallel file systems achieve a high I/O throughput by dividing a file into multiple blocks and storing them on multiple I/O nodes. However, the reliability and availability of the parallel file systems are sacrificed for the stripping of file data over multi I/O nodes. A new mechanism named Logic Mirror Ring (LMR), has been developed to improve the reliability and availability of the parallel file systems in this study. A logic mirror ring is built over all I/O nodes to indicate the mirror relationship among the nodes, i.e., each node maintains not only its own data but also the mirror data of other nodes. The fault tolerant capability of the system is improved because the node maintaining the mirror data of the failed node will take over the requests to the failed node. The mirror depth can be adjusted to different levels based on the requirements of the reliability and availability. A model is developed to evaluate the reliability and availability of the parallel file systems. The effects of LMR on the reliability and availability of the parallel file system is studied. The results show that LMR can be used to improve the reliability and availability of the parallel file systems effectively.