A cost effective fault-tolerant scheme for RAIDs

  • Authors:
  • Liang Fang;XiCheng Lu

  • Affiliations:
  • School of Computer, National University of Defense Technology, Changsha 410073, P.R. China;School of Computer, National University of Defense Technology, Changsha 410073, P.R. China

  • Venue:
  • Journal of Computer Science and Technology
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

The rapid progress in mass storage technology has made it possible for designers to implement large data storage systems for a variety of applications. One of the efficient ways to build large storage systems is to use RAIDs as basic storage modules. In general, the data can be recovered in RAIDs only when one error occurs. But in large RAIDs systems, the fault probability will increase when the number of disks increases, and the use of disks with big storage capacity will cause the recovering time to prolong, thus the probability of the second disk's fault will increase. Therefore, it is necessary to develop methods to recover data when two or more errors have occurred. In this paper, a fault tolerant scheme is proposed based on extended Reed-Solomon code, a recovery procedure is designed to correct up to two errors which is implemented by software and hardware together, and the scheme is verified by computer simulation. In this scheme, only two redundant disks are used to recover up to two disks' fault. The encoding and decoding methods, and the implementation based on software and hardware are described. The application of the scheme in software RAIDs that are built in cluster computers are also described. Compared with the existing methods such as EVENODD and DH, the proposed scheme has distinct improvement in implementation and redundancy.