Tolerating multiple failures in RAID architectures with optimal storage and uniform declustering

Authors:
Guillermo A. Alvarez;Walter A. Burkhard;Flaviu Cristian
Affiliations:
Department of Computer Science and Engineering, University of California, San Diego, La Jolla, CA;Department of Computer Science and Engineering, University of California, San Diego, La Jolla, CA;Department of Computer Science and Engineering, University of California, San Diego, La Jolla, CA
Venue:
Proceedings of the 24th annual international symposium on Computer architecture
Year:
1997

Citing 15
Cited 34

A case for redundant arrays of inexpensive disks (RAID)

SIGMOD '88 Proceedings of the 1988 ACM SIGMOD international conference on Management of data
Efficient dispersal of information for security, load balancing, and fault tolerance

Journal of the ACM (JACM)
Failure correction techniques for large disk arrays

ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Performance analysis of disk arrays under failure

Proceedings of the sixteenth international conference on Very large databases
Parity declustering for continuous operation in redundant disk arrays

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
The TickerTAIP parallel RAID architecture

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
The architecture of a fault-tolerant cached RAID controller

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Improved parity-declustered layouts for disk arrays

SPAA '94 Proceedings of the sixth annual ACM symposium on Parallel algorithms and architectures
EVENODD: an optimal scheme for tolerating double disk failures in RAID architectures

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Segmented information dispersal (SID) for efficient reconstruction in fault-tolerant video servers

MULTIMEDIA '96 Proceedings of the fourth ACM international conference on Multimedia
Performance analysis of a dual striping strategy for replicated disk arrays

PDIS '93 Proceedings of the second international conference on Parallel and distributed information systems
Almost Complete Address Translation (ACATS) Disk Array Declustering

SPDP '96 Proceedings of the 8th IEEE Symposium on Parallel and Distributed Processing (SPDP '96)
A Structured Approach to Redundant Disk Array Implementation

IPDS '96 Proceedings of the 2nd International Computer Performance and Dependability Symposium (IPDS '96)
A Redundant Disk Array Architecture for Efficient Small Writes

A Redundant Disk Array Architecture for Efficient Small Writes
AFRAID: a frequently redundant array of independent disks

ATEC '96 Proceedings of the 1996 annual conference on USENIX Annual Technical Conference

Declustered disk array architectures with optimal and near-optimal parallelism

Proceedings of the 25th annual international symposium on Computer architecture
LH*RS: a high-availability scalable distributed data structure using Reed Solomon Codes

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Ordering disks for double erasure codes

Proceedings of the thirteenth annual ACM symposium on Parallel algorithms and architectures
Data warehousing: the storage and access of distributed information

Knowledge Managment
A Gracefully Degradable Declustered RAID Architecture

Cluster Computing
Segmented Information Dispersal (SID) Data Layouts for Digital Video Servers

IEEE Transactions on Knowledge and Data Engineering
LH*G: A High-Availability Scalable Distributed Data Structure By Record Grouping

IEEE Transactions on Knowledge and Data Engineering
A Practical Parity Scheme for Tolerating Triple Disk Failures in RAID Architectures

ASIAN '00 Proceedings of the 6th Asian Computing Science Conference on Advances in Computing Science
Reliability Mechanisms for Very Large Storage Systems

MSS '03 Proceedings of the 20 th IEEE/11 th NASA Goddard Conference on Mass Storage Systems and Technologies (MSS'03)
Asynchronous Scheduling of Redundant Disk Arrays

IEEE Transactions on Computers
A multiple disk failure recovery scheme in RAID systems

Journal of Systems Architecture: the EUROMICRO Journal
Efficient data mappings for parity-declustered data layouts

Theoretical Computer Science - Special papers from: COCOON 2003
Improving storage system availability with D-GRAID

ACM Transactions on Storage (TOS)
Myriad: Cost-effective Disaster Tolerance

FAST '02 Proceedings of the 1st USENIX Conference on File and Storage Technologies
Awarded Best Paper! - Using MEMS-Based Storage in Disk Arrays

FAST '03 Proceedings of the 2nd USENIX Conference on File and Storage Technologies
LH*RS---a highly-available scalable distributed data structure

ACM Transactions on Database Systems (TODS)
IRON file systems

Proceedings of the twentieth ACM symposium on Operating systems principles
Awarded Best Paper! -- Row-Diagonal Parity for Double Disk Failure Correction

FAST '04 Proceedings of the 3rd USENIX Conference on File and Storage Technologies
Awarded Best Student Paper! -- Improving Storage System Availability with D-GRAID

FAST '04 Proceedings of the 3rd USENIX Conference on File and Storage Technologies
TRAP-Array: A Disk Array Architecture Providing Timely Recovery to Any Point-in-time

Proceedings of the 33rd annual international symposium on Computer Architecture
File replication in video on demand services

Proceedings of the 43rd annual Southeast regional conference - Volume 1
STAR: an efficient coding scheme for correcting triple storage node failures

FAST'05 Proceedings of the 4th conference on USENIX Conference on File and Storage Technologies - Volume 4
Performance of Two-Disk Failure-Tolerant Disk Arrays

IEEE Transactions on Computers
An XOR-based erasure-recovered algorithm for tolerating double disk failure in disk array systems

ACST'07 Proceedings of the third conference on IASTED International Conference: Advances in Computer Science and Technology
An analysis of data corruption in the storage stack

FAST'08 Proceedings of the 6th USENIX Conference on File and Storage Technologies
An analysis of data corruption in the storage stack

ACM Transactions on Storage (TOS)
Fault tolerant file models for parallel file systems: introducing distribution patterns for every file

The Journal of Supercomputing
Higher reliability redundant disk arrays: Organization, operation, and coding

ACM Transactions on Storage (TOS)
Efficient mappings for parity-declustered data layouts

COCOON'03 Proceedings of the 9th annual international conference on Computing and combinatorics
Myriad: cost-effective disaster tolerance

FAST'02 Proceedings of the 1st USENIX conference on File and storage technologies
Using MEMS-based storage in disk arrays

FAST'03 Proceedings of the 2nd USENIX conference on File and storage technologies
Row-diagonal parity for double disk failure correction

FAST'04 Proceedings of the 3rd USENIX conference on File and storage technologies
Improving storage system availability with D-GRAID

FAST'04 Proceedings of the 3rd USENIX conference on File and storage technologies
Towards reliable storage systems

Towards reliable storage systems

Quantified Score

Hi-index	0.01

Visualization

Abstract

We present DATUM, a novel method for tolerating multiple disk failures in disk arrays. DATUM is the first known method that can mask any given number of failures, requires an optimal amount of redundant storage space, and spreads reconstruction accesses uniformly over disks in the presence of failures without needing large layout tables in controller memory. Our approach is based on information dispersal, a coding technique that admits an efficient hardware implementation. As the method does not restrict the configuration parameters of the disk array, many existing RAID organizations are particular cases of DATUM. A detailed performance comparison with two other approaches shows that DATUM'S response times are similar to those of the best competitor when two or less disks fail, and that the performance degrades gracefully when more than two disks fail.