Extracting flexible, replayable models from large block traces

Authors:
V. Tarasov;S. Kumar;J. Ma;D. Hildebrand;A. Povzner;G. Kuenning;E. Zadok
Affiliations:
Stony Brook University;Stony Brook University;Harvey Mudd College;IBM Almaden Research;IBM Almaden Research;Harvey Mudd College;Stony Brook University
Venue:
FAST'12 Proceedings of the 10th USENIX conference on File and Storage Technologies
Year:
2012

Citing 34
Cited 7

SynRGen: an extensible file reference generator

SIGMETRICS '94 Proceedings of the 1994 ACM SIGMETRICS conference on Measurement and modeling of computer systems
A trace-driven comparison of algorithms for parallel prefetching and caching

OSDI '96 Proceedings of the second USENIX symposium on Operating systems design and implementation
A trace-driven analysis of the UNIX 4.2 BSD file system

Proceedings of the tenth ACM symposium on Operating systems principles
Capturing the spatio-temporal behavior of real traffic data

Performance Evaluation
Adaptive Disk Spin-down Policies for Mobile Computers

MLICS '95 Proceedings of the 2nd Symposium on Mobile and Location-Independent Computing
A Universal Dynamic Trace for Linux and Other Operating Systems

Proceedings of the FREENIX Track: 2001 USENIX Annual Technical Conference
Run-time modeling and estimation of operating system power consumption

SIGMETRICS '03 Proceedings of the 2003 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
A New Approach in the Modeling and Generation of Synthetic Disk Workload

MASCOTS '00 Proceedings of the 8th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems
Data Mining Meets Performance Evaluation: Fast Algorithms for Modeling Bursty Traffic

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Generating and analyzing synthetic workloads using iterative distillation

Generating and analyzing synthetic workloads using iterative distillation
The Relevance of Long-Range Dependence in Disk Traffic and Implications for Trace Synthesis

MSST '05 Proceedings of the 22nd IEEE / 13th NASA Goddard Conference on Mass Storage Systems and Technologies
Synthesizing Representative I/O Workloads for TPC-H

HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture
TBBT: scalable and accurate trace replay for file server evaluation

FAST'05 Proceedings of the 4th conference on USENIX Conference on File and Storage Technologies - Volume 4
Accurate and efficient replaying of file system traces

FAST'05 Proceedings of the 4th conference on USENIX Conference on File and Storage Technologies - Volume 4
An analysis of trace data for predictive file caching in mobile computing

USTC'94 Proceedings of the USENIX Summer 1994 Technical Conference on USENIX Summer 1994 Technical Conference - Volume 1
Trace: parallel trace replay with approximate causal events

FAST '07 Proceedings of the 5th USENIX conference on File and Storage Technologies
PARAID: a gear-shifting power-aware RAID

FAST '07 Proceedings of the 5th USENIX conference on File and Storage Technologies
A nine year study of file system and storage benchmarking

ACM Transactions on Storage (TOS)
Towards an I/O tracing framework taxonomy

PDSW '07 Proceedings of the 2nd international workshop on Petascale data storage: held in conjunction with Supercomputing '07
DataSeries: an efficient, flexible data format for structured serial data

ACM SIGOPS Operating Systems Review
Capture, conversion, and analysis of an intense NFS workload

FAST '09 Proccedings of the 7th conference on File and storage technologies
WorkOut: I/O workload outsourcing for boosting RAID reconstruction performance

FAST '09 Proccedings of the 7th conference on File and storage technologies
Measuring Database Performance in Online Services: A Trace-Based Approach

Performance Evaluation and Benchmarking
Characterizing, modeling, and generating workload spikes for stateful services

Proceedings of the 1st ACM symposium on Cloud computing
Discovery of application workloads from network file traces

FAST'10 Proceedings of the 8th USENIX conference on File and storage technologies
Evaluating performance and energy in file system server workloads

FAST'10 Proceedings of the 8th USENIX conference on File and storage technologies
The SCADS director: scaling a distributed storage system under stringent performance requirements

FAST'11 Proceedings of the 9th USENIX conference on File and stroage technologies
GPFS: a shared-disk file system for large computing clusters

FAST'02 Proceedings of the 1st USENIX conference on File and storage technologies
Metadata efficiency in versioning file systems

FAST'03 Proceedings of the 2nd USENIX conference on File and storage technologies
Passive NFS tracing of email and research workloads

FAST'03 Proceedings of the 2nd USENIX conference on File and storage technologies
Buttress: a toolkit for flexible and high fidelity I/O benchmarking

FAST'04 Proceedings of the 3rd USENIX conference on File and storage technologies
Tracefs: a file system to trace them all

FAST'04 Proceedings of the 3rd USENIX conference on File and storage technologies
C-Miner: mining block correlations in storage systems

FAST'04 Proceedings of the 3rd USENIX conference on File and storage technologies
Design implications for enterprise storage systems via multi-dimensional trace analysis

SOSP '11 Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles

Metadata Traces and Workload Models for Evaluating Big Storage Systems

UCC '12 Proceedings of the 2012 IEEE/ACM Fifth International Conference on Utility and Cloud Computing
Building intelligence for software defined data centers: modeling usage patterns

Proceedings of the 6th International Systems and Storage Conference
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles

ACM SIGOPS 24th Symposium on Operating Systems Principles
ROOT: replaying multithreaded traces with resource-oriented ordering

Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
Generating request streams on Big Data using clustered renewal processes

Performance Evaluation
Virtual machine workloads: the case for new benchmarks for NAS

FAST'13 Proceedings of the 11th USENIX conference on File and Storage Technologies
Automatic identification of application I/O signatures from noisy server-side traces

FAST'14 Proceedings of the 12th USENIX conference on File and Storage Technologies

Quantified Score

Hi-index	0.00

Visualization

Abstract

I/O traces are good sources of information about realworld workloads; replaying such traces is often used to reproduce the most realistic system behavior possible. But traces tend to be large, hard to use and share, and inflexible in representing more than the exact system conditions at the point the traces were captured. Often, however, researchers are not interested in the precise details stored in a bulky trace, but rather in some statistical properties found in the traces--properties that affect their system's behavior under load. We designed and built a system that (1) extracts many desired properties from a large block I/O trace, (2) builds a statistical model of the trace's salient characteristics, (3) converts the model into a concise description in the language of one or more synthetic load generators, and (4) can accurately replay the models in these load generators. Our system is modular and extensible. We experimented with several traces of varying types and sizes. Our concise models are 4-6% of the original trace size, and our modeling and replay accuracy are over 90%.