DPCT: distributed parity cache table for redundant parallel file system

Authors:
Sheng-Kai Hung;Yarsun Hsu
Affiliations:
Department of Electrical Engineering, National Tsing-Hua University, HsinChu, Taiwan;Department of Electrical Engineering, National Tsing-Hua University, HsinChu, Taiwan
Venue:
HPCC'06 Proceedings of the Second international conference on High Performance Computing and Communications
Year:
2006

Citing 11
Cited 0

A case for redundant arrays of inexpensive disks (RAID)

SIGMOD '88 Proceedings of the 1988 ACM SIGMOD international conference on Management of data
Swift/RAID: a distributed RAID system

Computing Systems
Parallel file systems for the IBM SP computers

IBM Systems Journal
The HP AutoRAID hierarchical storage system

ACM Transactions on Computer Systems (TOCS) - Special issue on operating system principles
Orthogonal Striping and Mirroring in Distributed RAID for I/O-Centric Cluster Computing

IEEE Transactions on Parallel and Distributed Systems
GPFS: A Shared-Disk File System for Large Computing Clusters

FAST '02 Proceedings of the Conference on File and Storage Technologies
Distributed RAID - A New Multiple Copy Algorithm

Proceedings of the Sixth International Conference on Data Engineering
Beowulf Cluster Computing with Linux

Beowulf Cluster Computing with Linux
Design, implementation and performance evaluation of a cost-effective, fault-tolerant parallel virtual file system

SNAPI '03 Proceedings of the international workshop on Storage network architecture and parallel I/Os
Scalability in the XFS file system

ATEC '96 Proceedings of the 1996 annual conference on USENIX Annual Technical Conference
Modularized redundant parallel virtual file system

ACSAC'05 Proceedings of the 10th Asia-Pacific conference on Advances in Computer Systems Architecture

Quantified Score

Hi-index	0.00

Visualization

Abstract

Using parity information to protect data from loss in a parallel file system is a straightforward and cost-effective method. However, the “small-write” phenomenon can lead to poor write performance. This is still true in the distributed paradigm even when file system cache is used. The local file system knows nothing about a stripe and thus can not benefit from the related blocks of a stripe. We propose a distributed parity cache table (DPCT) which knows the related blocks of a stripe and can use them to improve the performance of parity calculation and parity updating. This high level cache can benefit from previous reads and can aggregate small writes to improve the overall performance. We implement this mechanism in our reliable parallel file system (RPFS). The experimental results show that both read and write performance can be improved with DPCT support. The improvement comes from the fact that we can reduce the number of disk accesses by DPCT. This matches our quantitative analysis which shows that the number of disk accesses can be reduced from N to N(1–H), where N is the number of I/O nodes and H is the DPCT hit ratio.