Measurements of a distributed file system
SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
Algorithms on strings, trees, and sequences: computer science and computational biology
Algorithms on strings, trees, and sequences: computer science and computational biology
Input/output access pattern classification using hidden Markov models
Proceedings of the fifth workshop on I/O in parallel and distributed systems
The String-to-String Correction Problem
Journal of the ACM (JACM)
Automatic ARIMA Time Series Modeling for Adaptive I/O Prefetching
IEEE Transactions on Parallel and Distributed Systems
Trace-based analyses and optimizations for network storage servers
Trace-based analyses and optimizations for network storage servers
New NFS Tracing Tools and Techniques for System Analysis
LISA '03 Proceedings of the 17th USENIX conference on System administration
File Classification in Self-* Storage Systems
ICAC '04 Proceedings of the First International Conference on Autonomic Computing
Passive NFS Tracing of Email and Research Workloads
FAST '03 Proceedings of the 2nd USENIX Conference on File and Storage Technologies
Using magpie for request extraction and workload modelling
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
A comparison of file system workloads
ATEC '00 Proceedings of the annual conference on USENIX Annual Technical Conference
Categorizing and differencing system behaviours
HotAC II Hot Topics in Autonomic Computing on Hot Topics in Autonomic Computing
Measurement and analysis of large-scale network file system workloads
ATC'08 USENIX 2008 Annual Technical Conference on Annual Technical Conference
Capture, conversion, and analysis of an intense NFS workload
FAST '09 Proccedings of the 7th conference on File and storage technologies
Efficiently identifying working sets in block I/O streams
Proceedings of the 4th Annual International Conference on Systems and Storage
Revisiting the storage stack in virtualized NAS environments
WIOV'11 Proceedings of the 3rd conference on I/O virtualization
Extracting flexible, replayable models from large block traces
FAST'12 Proceedings of the 10th USENIX conference on File and Storage Technologies
LoadIQ: learning to identify workload phases from a live storage trace
HotStorage'12 Proceedings of the 4th USENIX conference on Hot Topics in Storage and File Systems
Systems research and innovation in data ONTAP
ACM SIGOPS Operating Systems Review
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
ACM SIGOPS 24th Symposium on Operating Systems Principles
ROOT: replaying multithreaded traces with resource-oriented ordering
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
Virtual machine workloads: the case for new benchmarks for NAS
FAST'13 Proceedings of the 11th USENIX conference on File and Storage Technologies
Automatic identification of application I/O signatures from noisy server-side traces
FAST'14 Proceedings of the 12th USENIX conference on File and Storage Technologies
Hi-index | 0.00 |
An understanding of application I/O access patterns is useful in several situations. First, gaining insight into what applications are doing with their data at a semantic level helps in designing efficient storage systems. Second, it helps create benchmarks that mimic realistic application behavior closely. Third, it enables autonomic systems as the information obtained can be used to adapt the system in a closed loop. All these use cases require the ability to extract the application-level semantics of I/O operations. Methods such as modifying application code to associate I/O operations with semantic tags are intrusive. It is well known that network file system traces are an important source of information that can be obtained non-intrusively and analyzed either online or offline. These traces are a sequence of primitive file system operations and their parameters. Simple counting, statistical analysis or deterministic search techniques are inadequate for discovering application-level semantics in the general case, because of the inherent variation and noise in realistic traces. In this paper, we describe a trace analysis methodology based on Profile Hidden Markov Models. We show that the methodology has powerful discriminatory capabilities that enable it to recognize applications based on the patterns in the traces, and to mark out regions in a long trace that encapsulate sets of primitive operations that represent higher-level application actions. It is robust enough that it can work around discrepancies between training and target traces such as in length and interleaving with other operations. We demonstrate the feasibility of recognizing patterns based on a small sampling of the trace, enabling faster trace analysis. Preliminary experiments show that the method is capable of learning accurate profile models on live traces in an online setting. We present a detailed evaluation of this methodology in a UNIX environment using NFS traces of selected commonly used applications such as compilations as well as on industrial strength benchmarks such as TPC-C and Postmark, and discuss its capabilities and limitations in the context of the use cases mentioned above.