Coding for High Availability of a Distributed-Parallel Storage System

Authors:
Qutaibah M. Malluhi;William E. Johnston
Affiliations:
Jackson State Univ., Jackson, MS;Ernest Orlando Lawrence Berkeley National, Berkeley, CA
Venue:
IEEE Transactions on Parallel and Distributed Systems
Year:
1998

Citing 13
Cited 4

A case for redundant arrays of inexpensive disks (RAID)

SIGMOD '88 Proceedings of the 1988 ACM SIGMOD international conference on Management of data
Error-control coding for computer systems

Error-control coding for computer systems
Efficient dispersal of information for security, load balancing, and fault tolerance

Journal of the ACM (JACM)
Coda: A Highly Available File System for a Distributed Workstation Environment

IEEE Transactions on Computers
Reliable broadband communication using a burst erasure correcting code

SIGCOMM '90 Proceedings of the ACM symposium on Communications architectures & protocols
Distributed parallel data storage systems: a scalable approach to high speed image servers

MULTIMEDIA '94 Proceedings of the second ACM international conference on Multimedia
EVENODD: An Efficient Scheme for Tolerating Double Disk Failures in RAID Architectures

IEEE Transactions on Computers - Special issue on fault-tolerant computing
Error-Correction Coding for Digital Communications

Error-Correction Coding for Digital Communications
Using high speed networks to enable distributed parallel image server systems

Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Continuous Retrieval of Multimedia Data Using Parallelism

IEEE Transactions on Knowledge and Data Engineering
Object Placement in Parallel Hypermedia Systems

VLDB '91 Proceedings of the 17th International Conference on Very Large Data Bases
Declustering Objects for Visualization

VLDB '93 Proceedings of the 19th International Conference on Very Large Data Bases
Adaptable Forward Error Correction for Multimedia Data Streams

Adaptable Forward Error Correction for Multimedia Data Streams

Network file storage with graceful performance degradation

ACM Transactions on Storage (TOS)
Strategies for storage of checkpointing data using non-dedicated repositories on Grid systems

MGC '05 Proceedings of the 3rd international workshop on Middleware for grid computing
Strategies for Checkpoint Storage on Opportunistic Grids

IEEE Distributed Systems Online
Efficient maintenance of distributed data in highly dynamic opportunistic grids

Proceedings of the 2009 ACM symposium on Applied Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

We have developed a distributed parallel storage system that employs the aggregate bandwidth of multiple data servers connected by a high-speed wide-area network to achieve scalability and high data throughput. This paper studies different schemes to enhance the reliability and availability of such network-based distributed storage systems. The general approach of this paper employs "erasure" error-correcting codes that can be used to reconstruct missing information caused by hardware, software, or human faults. The paper describes the approach and develops optimized algorithms for the encoding and decoding operations. Moreover, the paper presents techniques for reducing the communication and computation overhead incurred while reconstructing missing data from the redundant information. These techniques include clustering, multidimensional coding, and the full two-dimensional parity schemes. The paper considers trade-offs between redundancy, fault tolerance, and complexity of error recovery.