Generating representative Web workloads for network and server performance evaluation
SIGMETRICS '98/PERFORMANCE '98 Proceedings of the 1998 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Automatic modeling of file system workloads using two-level arrival processes
ACM Transactions on Modeling and Computer Simulation (TOMACS)
GISMO: a Generator of Internet Streaming Media Objects and workloads
ACM SIGMETRICS Performance Evaluation Review
ProWGen: a synthetic workload generation tool for simulation evaluation of web proxy caches
Computer Networks: The International Journal of Computer and Telecommunications Networking
MediSyn: a synthetic streaming media service workload generator
NOSSDAV '03 Proceedings of the 13th international workshop on Network and operating systems support for digital audio and video
Sources and Characteristics of Web Temporal Locality
MASCOTS '00 Proceedings of the 8th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems
Characteristics of WWW Client-based Traces
Characteristics of WWW Client-based Traces
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
IEEE/ACM Transactions on Networking (TON)
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Cost-aware WWW proxy caching algorithms
USITS'97 Proceedings of the USENIX Symposium on Internet Technologies and Systems on USENIX Symposium on Internet Technologies and Systems
Capture, conversion, and analysis of an intense NFS workload
FAST '09 Proccedings of the 7th conference on File and storage technologies
DiskReduce: RAID for data-intensive scalable computing
Proceedings of the 4th Annual Workshop on Petascale Data Storage
Benchmarking cloud serving systems with YCSB
Proceedings of the 1st ACM symposium on Cloud computing
Power-law revisited: large scale measurement study of P2P content popularity
IPTPS'10 Proceedings of the 9th international conference on Peer-to-peer systems
The Hadoop Distributed File System
MSST '10 Proceedings of the 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST)
Design implications for enterprise storage systems via multi-dimensional trace analysis
SOSP '11 Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles
Extracting flexible, replayable models from large block traces
FAST'12 Proceedings of the 10th USENIX conference on File and Storage Technologies
Interactive analytical processing in big data systems: a cross-industry study of MapReduce workloads
Proceedings of the VLDB Endowment
Metadata Traces and Workload Models for Evaluating Big Storage Systems
UCC '12 Proceedings of the 2012 IEEE/ACM Fifth International Conference on Utility and Cloud Computing
IISWC '12 Proceedings of the 2012 IEEE International Symposium on Workload Characterization (IISWC)
Hi-index | 0.00 |
The performance evaluation of large file systems, such as storage and media streaming, motivates scalable generation of representative traces. We focus on two key characteristics of traces, popularity and temporal locality. The common practice of using a system-wide distribution obscures per-object behavior, which is important for system evaluation. We propose a model based on delayed renewal processes which, by sampling interarrival times for each object, accurately reproduces popularity and temporal locality for the trace. A lightweight version reduces the dimension of the model with statistical clustering. It is workload-agnostic and object type-aware, suitable for testing emerging workloads and 'what-if' scenarios. We implemented a synthetic trace generator and validated it using: (1) a Big Data storage (HDFS) workload from Yahoo!, (2) a trace from a feature animation company, and (3) a streaming media workload. Two case studies in caching and replicated distributed storage systems show that our traces produce application-level results similar to the real workload. The trace generator is fast and readily scales to a system of 4.3 million files. It outperforms existing models in terms of accurately reproducing the characteristics of the real trace.