Deconstructing Commodity Storage Clusters

Authors:
Haryadi S. Gunawi;Nitin Agrawal;Andrea C. Arpaci-Dusseau;Remzi H. Arpaci-Dusseau;Jiri Schindler
Affiliations:
University of Wisconsin - Madison;University of Wisconsin - Madison;University of Wisconsin - Madison;University of Wisconsin - Madison;EMC Corporation
Venue:
Proceedings of the 32nd annual international symposium on Computer Architecture
Year:
2005

Citing 24
Cited 13

The design and implementation of a log-structured file system

ACM Transactions on Computer Systems (TOCS)
Architectural requirements of parallel scientific applications with explicit communication

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
A new approach to I/O performance evaluation: self-scaling I/O benchmarks, predicted I/O performance

SIGMETRICS '93 Proceedings of the 1993 ACM SIGMETRICS conference on Measurement and modeling of computer systems
RAID: high-performance, reliable secondary storage

ACM Computing Surveys (CSUR)
On-line extraction of SCSI disk drive parameters

Proceedings of the 1995 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
The SPLASH-2 programs: characterization and methodological considerations

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Informed prefetching and caching

SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Empirical evaluation of the CRAY-T3D: a compiler perspective

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Petal: distributed virtual disks

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Effects of communication latency, overhead, and bandwidth in a cluster architecture

Proceedings of the 24th annual international symposium on Computer architecture
Computing in the RAIN: A Reliable Array of Independent Nodes

IEEE Transactions on Parallel and Distributed Systems
Cache performance for selected SPEC CPU2000 benchmarks

ACM SIGARCH Computer Architecture News
Computer architecture: a quantitative approach

Computer architecture: a quantitative approach
Transaction Processing: Concepts and Techniques

Transaction Processing: Concepts and Techniques
A Case for NOW (Networks of Workstations)

IEEE Micro
Measuring Cache and TLB Performance and Their Effect on Benchmark Runtimes

IEEE Transactions on Computers
Beowulf Cluster Computing with Linux

Beowulf Cluster Computing with Linux
Performance debugging for distributed systems of black boxes

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Deconstructing storage arrays

ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Alternatives for detecting redundancy in storage systems data

ATEC '04 Proceedings of the annual conference on USENIX Annual Technical Conference
Magpie: online modelling and performance-aware systems

HOTOS'03 Proceedings of the 9th conference on Hot Topics in Operating Systems - Volume 9
Path-based faliure and evolution management

NSDI'04 Proceedings of the 1st conference on Symposium on Networked Systems Design and Implementation - Volume 1
mhz: anatomy of a micro-benchmark

ATEC '98 Proceedings of the annual conference on USENIX Annual Technical Conference
lmbench: portable tools for performance analysis

ATEC '96 Proceedings of the 1996 annual conference on USENIX Annual Technical Conference

IRON file systems

Proceedings of the twentieth ACM symposium on Operating systems principles
Semantically-smart disk systems: past, present, and future

ACM SIGMETRICS Performance Evaluation Review - Design, implementation, and performance of storage systems
Temporal search: detecting hidden malware timebombs with virtual machines

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
A five-year study of file-system metadata

ACM Transactions on Storage (TOS)
POTSHARDS: secure long-term storage without encryption

ATC'07 2007 USENIX Annual Technical Conference on Proceedings of the USENIX Annual Technical Conference
Pergamum: replacing tape with energy efficient, reliable, disk-based archival storage

FAST'08 Proceedings of the 6th USENIX Conference on File and Storage Technologies
Towards an I/O tracing framework taxonomy

PDSW '07 Proceedings of the 2nd international workshop on Petascale data storage: held in conjunction with Supercomputing '07
Secure data deduplication

Proceedings of the 4th ACM international workshop on Storage security and survivability
HYDRAstor: a Scalable Secondary Storage

FAST '09 Proccedings of the 7th conference on File and storage technologies
POTSHARDS—a secure, recoverable, long-term archival storage system

ACM Transactions on Storage (TOS)
Using link gradients to predict the impact of network latency on multitier applications

IEEE/ACM Transactions on Networking (TON)
Towards reliable storage systems

Towards reliable storage systems
Concurrent deletion in a distributed content-addressable storage system with global deduplication

FAST'13 Proceedings of the 11th USENIX conference on File and Storage Technologies

Quantified Score

Hi-index	0.00

Visualization

Abstract

The traditional approach for characterizing complex systems is to run standard workloads and measure the resulting performance as seen by the end user. However, unique opportunities exist when characterizing a system that is itself constructed from standardized components: one can also look inside the system itself by instrumenting each of the components. In this paper, we show how intra-box instrumentation can help one understand the behavior of a large-scale storage cluster, the EMC Centera. In our analysis, we leverage standard tools for tracing both the disk and network traffic emanating from each node of the cluster. By correlating this traffic with the running workload, we are able to infer the structure of the software system (e.g., its write update protocol) as well as its policies (e.g., how it performs caching, replication, and load-balancing). Further, by imposing variable intra-box delays on network and disk traffic, we can confirm the causal relationships between network and disk events. Thus, we are able to infer the semantics of the messages between nodes without examining a single line of source code.