FAB: building distributed enterprise disk arrays from commodity components

Authors:
Yasushi Saito;Svend Frølund;Alistair Veitch;Arif Merchant;Susan Spence
Affiliations:
Hewlett-Packard Laboratories;Hewlett-Packard Laboratories;Hewlett-Packard Laboratories;Hewlett-Packard Laboratories;Hewlett-Packard Laboratories
Venue:
ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Year:
2004

Citing 19
Cited 52

Viewstamped Replication: A New Primary Copy Method to Support Highly-Available Distributed Systems

PODC '88 Proceedings of the seventh annual ACM Symposium on Principles of distributed computing
Linearizability: a correctness condition for concurrent objects

ACM Transactions on Programming Languages and Systems (TOPLAS)
Efficient at-most-once messages based on synchronized clocks

ACM Transactions on Computer Systems (TOCS)
RAID: high-performance, reliable secondary storage

ACM Computing Surveys (CSUR)
The TickerTAIP parallel RAID architecture

ACM Transactions on Computer Systems (TOCS)
Improved algorithms for synchronizing computer network clocks

SIGCOMM '94 Proceedings of the conference on Communications architectures, protocols and applications
Sharing memory robustly in message-passing systems

Journal of the ACM (JACM)
Petal: distributed virtual disks

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Dynamic voting for consistent primary components

PODC '97 Proceedings of the sixteenth annual ACM symposium on Principles of distributed computing
A tutorial on Reed-Solomon coding for fault-tolerance in RAID-like systems

Software—Practice & Experience
The part-time parliament

ACM Transactions on Computer Systems (TOCS)
A cost-effective, high-bandwidth storage architecture

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
RAMBO: A Reconfigurable Atomic Memory Service for Dynamic Networks

DISC '02 Proceedings of the 16th International Conference on Distributed Computing
Weighted voting for replicated data

SOSP '79 Proceedings of the seventh ACM symposium on Operating systems principles
Reducing the Cost of System Administration of a Disk Storage System

Reducing the Cost of System Administration of a Disk Storage System
A Decentralized Algorithm for Erasure-Coded Virtual Disks

DSN '04 Proceedings of the 2004 International Conference on Dependable Systems and Networks
Farsite: federated, available, and reliable storage for an incompletely trusted environment

OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
More Than an Interface---SCSI vs. ATA

FAST '03 Proceedings of the 2nd USENIX Conference on File and Storage Technologies
Session state: beyond soft state

NSDI'04 Proceedings of the 1st conference on Symposium on Networked Systems Design and Implementation - Volume 1

Jockey: a user-space library for record-replay debugging

Proceedings of the sixth international symposium on Automated analysis-driven debugging
Experiences with Pip: finding unexpected behavior in distributed systems

Proceedings of the twentieth ACM symposium on Operating systems principles
IBM intelligent Bricks project: petabytes and beyond

IBM Journal of Research and Development
Reliability of modular mesh-connected intelligent storage brick systems

IBM Journal of Research and Development
CRUSH: controlled, scalable, decentralized placement of replicated data

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
On the road to recovery: restoring data after disasters

Proceedings of the 1st ACM SIGOPS/EuroSys European Conference on Computer Systems 2006
Refined quorum systems

Proceedings of the twenty-sixth annual ACM symposium on Principles of distributed computing
Verifying distributed erasure-coded data

Proceedings of the twenty-sixth annual ACM symposium on Principles of distributed computing
Remote storage with byzantine servers

Proceedings of the twenty-sixth annual ACM symposium on Principles of distributed computing
d-clock: distributed QoS in heterogeneous resource environments

Proceedings of the twenty-sixth annual ACM symposium on Principles of distributed computing
Low-overhead byzantine fault-tolerant storage

Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
Dynamo: amazon's highly available key-value store

Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
Ceph: a scalable, high-performance distributed file system

OSDI '06 Proceedings of the 7th symposium on Operating systems design and implementation
Disaster recovery codes: increasing reliability with large-stripe erasure correcting codes

Proceedings of the 2007 ACM workshop on Storage security and survivability
Niobe: A practical replication protocol

ACM Transactions on Storage (TOS)
Pergamum: replacing tape with energy efficient, reliable, disk-based archival storage

FAST'08 Proceedings of the 6th USENIX Conference on File and Storage Technologies
Using utility to provision storage systems

FAST'08 Proceedings of the 6th USENIX Conference on File and Storage Technologies
RADOS: a scalable, reliable storage service for petabyte-scale storage clusters

PDSW '07 Proceedings of the 2nd international workshop on Petascale data storage: held in conjunction with Supercomputing '07
Shifted declustering: a placement-ideal layout scheme for multi-way replication storage architecture

Proceedings of the 22nd annual international conference on Supercomputing
Reconfigurable distributed storage for dynamic networks

Journal of Parallel and Distributed Computing
Umbrella file system: Storage management across heterogeneous devices

ACM Transactions on Storage (TOS)
HYDRAstor: a Scalable Secondary Storage

FAST '09 Proccedings of the 7th conference on File and storage technologies
Making cluster applications energy-aware

ACDC '09 Proceedings of the 1st workshop on Automated control for datacenters and clouds
Remote storage with byzantine servers

Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures
Dynamic storage cache allocation in multi-server architectures

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
A distributed hash table-based approach for providing a file system neutral, robust and resilient storage infrastructure

NGI'09 Proceedings of the 5th Euro-NGI conference on Next Generation Internet networks
BlobSeer: how to enable efficient versioning for large object storage under heavy access concurrency

Proceedings of the 2009 EDBT/ICDT Workshops
Optimal deployment of eventually-serializable data services

CPAIOR'08 Proceedings of the 5th international conference on Integration of AI and OR techniques in constraint programming for combinatorial optimization problems
A load balancing framework for clustered storage systems

HiPC'08 Proceedings of the 15th international conference on High performance computing
Lithium: virtual machine storage for the cloud

Proceedings of the 1st ACM symposium on Cloud computing
Robust and flexible power-proportional storage

Proceedings of the 1st ACM symposium on Cloud computing
Automated control for elastic storage

Proceedings of the 7th international conference on Autonomic computing
Extensible block-level storage virtualization in cluster-based systems

Journal of Parallel and Distributed Computing
DARC: design and evaluation of an I/O controller for data protection

Proceedings of the 3rd Annual Haifa Experimental Systems Conference
Brief announcement: a shared disk on distributed storage

Proceedings of the 29th ACM SIGACT-SIGOPS symposium on Principles of distributed computing
No time for asynchrony

HotOS'09 Proceedings of the 12th conference on Hot topics in operating systems
Everest: scaling down peak loads through I/O off-loading

OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Data-centric reconfiguration with network-attached disks

Proceedings of the 4th International Workshop on Large Scale Distributed Systems and Middleware
mClock: handling throughput variability for hypervisor IO scheduling

OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
Sierra: practical power-proportionality for data center storage

Proceedings of the sixth conference on Computer systems
Paxos replicated state machines as the basis of a high-performance data store

Proceedings of the 8th USENIX conference on Networked systems design and implementation
Online migration for geo-distributed storage systems

USENIXATC'11 Proceedings of the 2011 USENIX conference on USENIX annual technical conference
Transactional storage for geo-replicated systems

SOSP '11 Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles
Fast Access to Distributed Atomic Memory

SIAM Journal on Computing
A cooperation mechanism in agent-based autonomic storage systems

CIS'05 Proceedings of the 2005 international conference on Computational Intelligence and Security - Volume Part I
Reconfigurable distributed storage for dynamic networks

OPODIS'05 Proceedings of the 9th international conference on Principles of Distributed Systems
Surviving congestion in geo-distributed storage systems

USENIX ATC'12 Proceedings of the 2012 USENIX conference on Annual Technical Conference
Pyramid Codes: Flexible Schemes to Trade Space for Access Efficiency in Reliable Data Storage Systems

ACM Transactions on Storage (TOS)
Trevi: watering down storage hotspots with cool fountain codes

Proceedings of the Twelfth ACM Workshop on Hot Topics in Networks
Strata: scalable high-performance storage on virtualized non-volatile memory

FAST'14 Proceedings of the 12th USENIX conference on File and Storage Technologies
SpringFS: bridging agility and performance in elastic distributed storage

FAST'14 Proceedings of the 12th USENIX conference on File and Storage Technologies
A novel approach to data deduplication over the engineering-oriented cloud systems

Integrated Computer-Aided Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes the design, implementation, and evaluation of a Federated Array of Bricks (FAB), a distributed disk array that provides the reliability of traditional enterprise arrays with lower cost and better scalability. FAB is built from a collection of bricks, small storage appliances containing commodity disks, CPU, NVRAM, and network interface cards. FAB deploys a new majority-voting-based algorithm to replicate or erasure-code logical blocks across bricks and a reconfiguration algorithm to move data in the background when bricks are added or decommissioned. We argue that voting is practical and necessary for reliable, high-throughput storage systems such as FAB. We have implemented a FAB prototype on a 22-node Linux cluster. This prototype sustains 85MB/second of throughput for a database workload, and 270MB/second for a bulk-read workload. In addition, it can outperform traditional master-slave replication through performance decoupling and can handle brick failures and recoveries smoothly without disturbing client requests.