Capture, conversion, and analysis of an intense NFS workload

Authors:
Eric Anderson
Affiliations:
HP Labs
Venue:
FAST '09 Proccedings of the 7th conference on File and storage technologies
Year:
2009

Citing 16
Cited 11

Immediate files

Software—Practice & Experience
On the self-similar nature of Ethernet traffic (extended version)

IEEE/ACM Transactions on Networking (TON)
Approximate medians and other quantiles in one pass and with limited memory

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Self-similarity in file systems

SIGMETRICS '98/PERFORMANCE '98 Proceedings of the 1998 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
A trace-driven analysis of the UNIX 4.2 BSD file system

Proceedings of the tenth ACM symposium on Operating systems principles
A relational model of data for large shared data banks

Communications of the ACM
Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals

Data Mining and Knowledge Discovery
New NFS Tracing Tools and Techniques for System Analysis

LISA '03 Proceedings of the 17th USENIX conference on System administration
Passive NFS Tracing of Email and Research Workloads

FAST '03 Proceedings of the 2nd USENIX Conference on File and Storage Technologies
Buttress: A Toolkit for Flexible and High Fidelity I/O Benchmarking

FAST '04 Proceedings of the 3rd USENIX Conference on File and Storage Technologies
MEMS-based Storage Devices and Standard Disk Interfaces: A Square Peg in a Round Hole?

FAST '04 Proceedings of the 3rd USENIX Conference on File and Storage Technologies
The devil and packet trace anonymization

ACM SIGCOMM Computer Communication Review
TBBT: scalable and accurate trace replay for file server evaluation

FAST'05 Proceedings of the 4th conference on USENIX Conference on File and Storage Technologies - Volume 4
A flash-memory based file system

TCON'95 Proceedings of the USENIX 1995 Technical Conference Proceedings
Measurement and analysis of large-scale network file system workloads

ATC'08 USENIX 2008 Annual Technical Conference on Annual Technical Conference
DataSeries: an efficient, flexible data format for structured serial data

ACM SIGOPS Operating Systems Review

Discovery of application workloads from network file traces

FAST'10 Proceedings of the 8th USENIX conference on File and storage technologies
Improving the efficiency of information collection and analysis in widely-used IT applications

Proceedings of the 2nd ACM/SPEC International Conference on Performance engineering
Understanding and Improving Computational Science Storage Access through Continuous Characterization

ACM Transactions on Storage (TOS)
Analysis of Workload Behavior in Scientific and Historical Long-Term Data Repositories

ACM Transactions on Storage (TOS)
Extracting flexible, replayable models from large block traces

FAST'12 Proceedings of the 10th USENIX conference on File and Storage Technologies
LoadIQ: learning to identify workload phases from a live storage trace

HotStorage'12 Proceedings of the 4th USENIX conference on Hot Topics in Storage and File Systems
Workload diversity and dynamics in big data analytics: implications to system designers

Proceedings of the 2nd Workshop on Architectures and Systems for Big Data
Usage behavior of a large-scale scientific archive

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Building intelligence for software defined data centers: modeling usage patterns

Proceedings of the 6th International Systems and Storage Conference
Generating request streams on Big Data using clustered renewal processes

Performance Evaluation
Characterization of incremental data changes for efficient data protection

USENIX ATC'13 Proceedings of the 2013 USENIX conference on Annual Technical Conference

Quantified Score

Hi-index	0.00

Visualization

Abstract

We describe methods to capture, convert, store and analyze NFS workloads that are 20-100× more intense, in terms of operations/day, than any previously published. We describe three techniques that improve capture performance by up to 10× over previous techniques. For conversion, we use a general-purpose format that is both highly space efficient and provides efficient access to the trace data. For analysis, we describe a number of techniques adopted from the database community and some new techniques that facilitate analysis of very large traces. We also describe a number of guidelines for trace collection that should prove useful to future practitioners. Finally, we analyze a commercial feature animation (movie) rendering workload using these techniques and discuss the characteristics of the workload. Our implementation of these techniques is available as open source and the exact anonymized datasets we analyze are available for free download.