Discovery of application workloads from network file traces

Authors:
Neeraja J. Yadwadkar;Chiranjib Bhattacharyya;K. Gopinath;Thirumale Niranjan;Sai Susarla
Affiliations:
Department of Computer Science and Automation, Indian Institute of Science;Department of Computer Science and Automation, Indian Institute of Science;Department of Computer Science and Automation, Indian Institute of Science;NetApp Advanced Technology Group;NetApp Advanced Technology Group
Venue:
FAST'10 Proceedings of the 8th USENIX conference on File and storage technologies
Year:
2010

Citing 14
Cited 9

Measurements of a distributed file system

SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
Algorithms on strings, trees, and sequences: computer science and computational biology

Algorithms on strings, trees, and sequences: computer science and computational biology
Input/output access pattern classification using hidden Markov models

Proceedings of the fifth workshop on I/O in parallel and distributed systems
The String-to-String Correction Problem

Journal of the ACM (JACM)
Automatic ARIMA Time Series Modeling for Adaptive I/O Prefetching

IEEE Transactions on Parallel and Distributed Systems
Trace-based analyses and optimizations for network storage servers

Trace-based analyses and optimizations for network storage servers
New NFS Tracing Tools and Techniques for System Analysis

LISA '03 Proceedings of the 17th USENIX conference on System administration
File Classification in Self-* Storage Systems

ICAC '04 Proceedings of the First International Conference on Autonomic Computing
Passive NFS Tracing of Email and Research Workloads

FAST '03 Proceedings of the 2nd USENIX Conference on File and Storage Technologies
Using magpie for request extraction and workload modelling

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
A comparison of file system workloads

ATEC '00 Proceedings of the annual conference on USENIX Annual Technical Conference
Categorizing and differencing system behaviours

HotAC II Hot Topics in Autonomic Computing on Hot Topics in Autonomic Computing
Measurement and analysis of large-scale network file system workloads

ATC'08 USENIX 2008 Annual Technical Conference on Annual Technical Conference
Capture, conversion, and analysis of an intense NFS workload

FAST '09 Proccedings of the 7th conference on File and storage technologies

Efficiently identifying working sets in block I/O streams

Proceedings of the 4th Annual International Conference on Systems and Storage
Revisiting the storage stack in virtualized NAS environments

WIOV'11 Proceedings of the 3rd conference on I/O virtualization
Extracting flexible, replayable models from large block traces

FAST'12 Proceedings of the 10th USENIX conference on File and Storage Technologies
LoadIQ: learning to identify workload phases from a live storage trace

HotStorage'12 Proceedings of the 4th USENIX conference on Hot Topics in Storage and File Systems
Systems research and innovation in data ONTAP

ACM SIGOPS Operating Systems Review
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles

ACM SIGOPS 24th Symposium on Operating Systems Principles
ROOT: replaying multithreaded traces with resource-oriented ordering

Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
Virtual machine workloads: the case for new benchmarks for NAS

FAST'13 Proceedings of the 11th USENIX conference on File and Storage Technologies
Automatic identification of application I/O signatures from noisy server-side traces

FAST'14 Proceedings of the 12th USENIX conference on File and Storage Technologies

Quantified Score

Hi-index	0.00

Visualization

Abstract

An understanding of application I/O access patterns is useful in several situations. First, gaining insight into what applications are doing with their data at a semantic level helps in designing efficient storage systems. Second, it helps create benchmarks that mimic realistic application behavior closely. Third, it enables autonomic systems as the information obtained can be used to adapt the system in a closed loop. All these use cases require the ability to extract the application-level semantics of I/O operations. Methods such as modifying application code to associate I/O operations with semantic tags are intrusive. It is well known that network file system traces are an important source of information that can be obtained non-intrusively and analyzed either online or offline. These traces are a sequence of primitive file system operations and their parameters. Simple counting, statistical analysis or deterministic search techniques are inadequate for discovering application-level semantics in the general case, because of the inherent variation and noise in realistic traces. In this paper, we describe a trace analysis methodology based on Profile Hidden Markov Models. We show that the methodology has powerful discriminatory capabilities that enable it to recognize applications based on the patterns in the traces, and to mark out regions in a long trace that encapsulate sets of primitive operations that represent higher-level application actions. It is robust enough that it can work around discrepancies between training and target traces such as in length and interleaving with other operations. We demonstrate the feasibility of recognizing patterns based on a small sampling of the trace, enabling faster trace analysis. Preliminary experiments show that the method is capable of learning accurate profile models on live traces in an online setting. We present a detailed evaluation of this methodology in a UNIX environment using NFS traces of selected commonly used applications such as compilations as well as on industrial strength benchmarks such as TPC-C and Postmark, and discuss its capabilities and limitations in the context of the use cases mentioned above.