Automatic modeling of file system workloads using two-level arrival processes

Authors:
Peter P. Ware;Thomas W. Page, Jr.;Barry L. Nelson
Affiliations:
Ohio State Univ., Columbus;Ohio State Univ., Columbus;Northwestern Univ., Evanston, IL
Venue:
ACM Transactions on Modeling and Computer Simulation (TOMACS)
Year:
1998

Citing 12
Cited 8

Coda: A Highly Available File System for a Distributed Workstation Environment

IEEE Transactions on Computers
Update Transport: A New Technique for Update Synchronization in Replicated Database Systems

IEEE Transactions on Software Engineering
A synthetic workload model for a distributed system file server

SIGMETRICS '91 Proceedings of the 1991 ACM SIGMETRICS conference on Measurement and modeling of computer systems
On the self-similar nature of Ethernet traffic (extended version)

IEEE/ACM Transactions on Networking (TON)
SynRGen: an extensible file reference generator

SIGMETRICS '94 Proceedings of the 1994 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Wide-area traffic: the failure of Poisson modeling

SIGCOMM '94 Proceedings of the conference on Communications architectures, protocols and applications
Techniques for file system simulation

Software—Practice & Experience
Self-similarity through high-variability: statistical analysis of ethernet LAN traffic at the source level

SIGCOMM '95 Proceedings of the conference on Applications, technologies, architectures, and protocols for computer communication
Performance impacts of self-similarity in traffic

Proceedings of the 1995 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Managing update conflicts in Bayou, a weakly connected replicated storage system

SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Perspectives on optimistically replicated, peer-to-peer filing

Software—Practice & Experience
On resource management and QoS guarantees for long range dependent traffic

INFOCOM '95 Proceedings of the Fourteenth Annual Joint Conference of the IEEE Computer and Communication Societies (Vol. 2)-Volume - Volume 2

Advanced input modeling for simulation experimentation

Proceedings of the 31st conference on Winter simulation: Simulation---a bridge to the future - Volume 1
Input modeling and its impact: modeling and generating multivariate time series with arbitrary marginals and autocorrelation structures

Proceedings of the 33nd conference on Winter simulation
Modeling and generating multivariate time-series input processes using a vector autoregressive technique

ACM Transactions on Modeling and Computer Simulation (TOMACS)
Advanced input modeling: parameter estimation for ARTA processes

Proceedings of the 34th conference on Winter simulation: exploring new frontiers
Dependence modeling for stochastic simulation

WSC '04 Proceedings of the 36th conference on Winter simulation
Autonomic storage system based on automatic learning

HiPC'04 Proceedings of the 11th international conference on High Performance Computing
On extracting session data from activity logs

Proceedings of the 5th Annual International Systems and Storage Conference
Generating request streams on Big Data using clustered renewal processes

Performance Evaluation

Quantified Score

Hi-index	0.00

Visualization

Abstract

This article describes a method for analyzing, modeling, and simulating a two-level arrival-counting process. This method is particlarly appropriate when the number of independent processes is large, as is the case in our motivating application which requires analyzing and representing computer file system trace data for activity on nearly 8,000 files. The method is also applicable to network trace data characterizing communication patterns between pairs of computers. We apply cluster analysis to separate the arrival process into groups or bursts of activity on a file. We then characterize the arrival procss in terms of the time between bursts of activity on file, the time between file events within bursts, and the number of events in a burst. Finally, we model these three components individually, then reassemble the results to produce a synthetic trace generator. In order to gauge the effectiveness of this method, we use synthetically generated (simulated) trace data produced in this way to drive a discrete-event simulation of a distributed replicated file system. We compare the results of the simulation driven by the synthetic trace with the same simulation driven by the original trace data, and conclude that the synthetic data capture the essential characteristics of the empirical trace.