Reliable computer systems (2nd ed.): design and evaluation
Reliable computer systems (2nd ed.): design and evaluation
Distributed sparing in disk arrays
COMPCON '92 Proceedings of the thirty-seventh international conference on COMPCON
RAID: high-performance, reliable secondary storage
ACM Computing Surveys (CSUR)
The HP AutoRAID hierarchical storage system
SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Finite fields
Tolerating multiple failures in RAID architectures with optimal storage and uniform declustering
Proceedings of the 24th annual international symposium on Computer architecture
A tutorial on Reed-Solomon coding for fault-tolerance in RAID-like systems
Software—Practice & Experience
A cost-effective, high-bandwidth storage architecture
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
LH*RS: a high-availability scalable distributed data structure using Reed Solomon Codes
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Filtering algorithms and implementation for very fast publish/subscribe systems
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
OceanStore: an architecture for global-scale persistent storage
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Performance Analysis of Disk Arrays under Failure
VLDB '90 Proceedings of the 16th International Conference on Very Large Data Bases
Erasure Coding Vs. Replication: A Quantitative Comparison
IPTPS '01 Revised Papers from the First International Workshop on Peer-to-Peer Systems
A Fast Algorithm for Online Placement and Reorganization of Replicated Data
IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
Computer Architecture: A Quantitative Approach
Computer Architecture: A Quantitative Approach
Recovery Oriented Computing (ROC): Motivation, Definition, Techniques,
Recovery Oriented Computing (ROC): Motivation, Definition, Techniques,
Farsite: federated, available, and reliable storage for an incompletely trusted environment
OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
Pangaea: a symbiotic wide-area file system
EW 10 Proceedings of the 10th workshop on ACM SIGOPS European workshop
Proactive recovery in a Byzantine-fault-tolerant system
OSDI'00 Proceedings of the 4th conference on Symposium on Operating System Design & Implementation - Volume 4
Algebraic Signatures for Scalable Distributed Data Structures
ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Dynamic Metadata Management for Petabyte-Scale File Systems
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
A Self-Organizing Storage Cluster for Parallel Data-Intensive Applications
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Hash-based labeling techniques for storage scaling
The VLDB Journal — The International Journal on Very Large Data Bases
LH*RS---a highly-available scalable distributed data structure
ACM Transactions on Database Systems (TODS)
Awarded Best Paper! -- Row-Diagonal Parity for Double Disk Failure Correction
FAST '04 Proceedings of the 3rd USENIX Conference on File and Storage Technologies
Secure capabilities for a petabyte-scale object-based distributed file system
Proceedings of the 2005 ACM workshop on Storage security and survivability
Multi-level RAID for very large disk arrays
ACM SIGMETRICS Performance Evaluation Review - Design, implementation, and performance of storage systems
Disaster recovery codes: increasing reliability with large-stripe erasure correcting codes
Proceedings of the 2007 ACM workshop on Storage security and survivability
An XOR-based erasure-recovered algorithm for tolerating double disk failure in disk array systems
ACST'07 Proceedings of the third conference on IASTED International Conference: Advances in Computer Science and Technology
ACM Transactions on Storage (TOS)
RADOS: a scalable, reliable storage service for petabyte-scale storage clusters
PDSW '07 Proceedings of the 2nd international workshop on Petascale data storage: held in conjunction with Supercomputing '07
WorkOut: I/O workload outsourcing for boosting RAID reconstruction performance
FAST '09 Proccedings of the 7th conference on File and storage technologies
ATTEST: ATTributes-based Extendable STorage
Journal of Systems and Software
A strategy to emulate NOR flash with NAND flash
ACM Transactions on Storage (TOS)
LH*RSP2P: a fast and high churn resistant scalable distributed data structure for P2P systems
International Journal of Internet Technology and Secured Transactions
Optimal recovery of single disk failure in RDP code storage systems
Proceedings of the ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Understanding the relationship between energy conservation and reliability in parallel disk arrays
Journal of Parallel and Distributed Computing
Availability in globally distributed storage systems
OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
Row-diagonal parity for double disk failure correction
FAST'04 Proceedings of the 3rd USENIX conference on File and storage technologies
A Hybrid Approach to Failed Disk Recovery Using RAID-6 Codes: Algorithms and Performance Evaluation
ACM Transactions on Storage (TOS)
A caching-oriented management design for the performance enhancement of solid-state drives
ACM Transactions on Storage (TOS)
A high availability mechanism for parallel file system
APPT'05 Proceedings of the 6th international conference on Advanced Parallel Processing Technologies
RDIM: a self-adaptive and balanced distribution for replicated data in scalable storage clusters
ISPA'05 Proceedings of the Third international conference on Parallel and Distributed Processing and Applications
ORAID: an intelligent and fault-tolerant object storage device
EUC'05 Proceedings of the 2005 international conference on Embedded and Ubiquitous Computing
Erasure coding in windows azure storage
USENIX ATC'12 Proceedings of the 2012 USENIX conference on Annual Technical Conference
A reliability optimization method for RAID-structured storage systems based on active data migration
Journal of Systems and Software
XORing elephants: novel erasure codes for big data
Proceedings of the VLDB Endowment
Exploiting Redundancies and Deferred Writes to Conserve Energy in Erasure-Coded Storage Clusters
ACM Transactions on Storage (TOS)
Hi-index | 0.00 |
Reliability and availability are increasingly important in large-scale storage systems built from thousands of individual storage devices. Large systems must survive the failure of individual components; in systems with thousands of disks, even infrequent failures are likely in some device. We focus on two types of errors: nonrecoverable read errors and drive failures. We discuss mechanisms for detecting and recovering from such errors, introducing improvedtechniques for detecting errors in disk reads and fast recovery from disk failure. We show that simple RAID cannot guarantee sufficient reliability; our analysis examines the tradeoffs among other schemes between system availability and storage efficiency. Based on our data, we believe that two-way mirroring should be sufficient for most large storage systems. For those that need very high reliability, we recommend either three-way mirroring or mirroring combined with RAID.