Reliable Cluster Computing with a New Checkpointing RAID-x Architecture

  • Authors:
  • Kai Hwang;Hai Jin;Roy Ho;Wonwoo Ro

  • Affiliations:
  • -;-;-;-

  • Venue:
  • HCW '00 Proceedings of the 9th Heterogeneous Computing Workshop
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

In a serverless cluster of PCs or workstations, the cluster must allow remote file accesses or parallel I/O directly performed over disks distributed to all client nodes. We introduce a new distributed disk array, called the RAID-x, for use in serverless clusters. The RAID-x architecture is based on an orthogonal striping and mirroring (OSM) scheme, which exploits full-bandwidth and protects the system from all single disk failures.The performance of the RAID-x is experimentally proven superior to RAID-1 and NFS in the Linux cluster environment. We propose a new striped checkpointing scheme, leveraging on striped parallelism and pipelined writing of successive disk stripes. This RAID-x architecture greatly enhances the throughput, reliability, and availability of scalable clusters. It appeals especially to I/O-centric cluster applications.