Efficiently identifying working sets in block I/O streams

Authors:
Avani Wildani;Ethan L. Miller;Lee Ward
Affiliations:
Univ. of California, Santa Cruz, Santa Cruz, California;Univ. of California, Santa Cruz, Santa Cruz, California;Computer Science Research Institute, Sandia National Labs, Albuquerque, NM
Venue:
Proceedings of the 4th Annual International Conference on Systems and Storage
Year:
2011

Citing 25
Cited 1

A fast file system for UNIX

ACM Transactions on Computer Systems (TOCS)
Markov model prediction of I/O requests for scientific applications

ICS '02 Proceedings of the 16th international conference on Supercomputing
Track-Aligned Extents: Matching Access Patterns to Disk Drive Characteristics

FAST '02 Proceedings of the Conference on File and Storage Technologies
Design and Implementation of a Predictive File Prefetching Algorithm

Proceedings of the General Track: 2002 USENIX Annual Technical Conference
Massive arrays of idle disks for storage archives

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
PROFS-Performance-Oriented Data Reorganization for Log-Structured File System on Multi-Zone Disks

MASCOTS '01 Proceedings of the Ninth International Symposium in Modeling, Analysis and Simulation of Computer and Telecommunication Systems
Aggregating Caches: A Mechanism for Implicit File Prefetching

MASCOTS '01 Proceedings of the Ninth International Symposium in Modeling, Analysis and Simulation of Computer and Telecommunication Systems
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
Energy conservation techniques for disk array-based servers

Proceedings of the 18th annual international conference on Supercomputing
Improving storage system availability with D-GRAID

ACM Transactions on Storage (TOS)
Semantically-Smart Disk Systems

FAST '03 Proceedings of the 2nd USENIX Conference on File and Storage Technologies
C-Miner: Mining Block Correlations in Storage Systems

FAST '04 Proceedings of the 3rd USENIX Conference on File and Storage Technologies
File size distribution on UNIX systems: then and now

ACM SIGOPS Operating Systems Review
Semantically-smart disk systems: past, present, and future

ACM SIGMETRICS Performance Evaluation Review - Design, implementation, and performance of storage systems
Reducing Cache Pollution via Dynamic Data Prefetch Filtering

IEEE Transactions on Computers
DULO: an effective buffer cache management scheme to exploit both temporal and spatial locality

FAST'05 Proceedings of the 4th conference on USENIX Conference on File and Storage Technologies - Volume 4
File access prediction with adjustable accuracy

PCC '02 Proceedings of the Performance, Computing, and Communications Conference, 2002. on 21st IEEE International
Disk drive level workload characterization

ATEC '06 Proceedings of the annual conference on USENIX '06 Annual Technical Conference
Failure trends in a large disk drive population

FAST '07 Proceedings of the 5th USENIX conference on File and Storage Technologies
Predicting file system actions from prior events

ATEC '96 Proceedings of the 1996 annual conference on USENIX Annual Technical Conference
Predictive data grouping: Defining the bounds of energy and latency reduction through predictive data grouping and replication

ACM Transactions on Storage (TOS)
DiskSeen: exploiting disk layout and access history to enhance I/O prefetch

ATC'07 2007 USENIX Annual Technical Conference on Proceedings of the USENIX Annual Technical Conference
File grouping for scientific data management: lessons from experimenting with real traces

HPDC '08 Proceedings of the 17th international symposium on High performance distributed computing
Write off-loading: Practical power management for enterprise storage

ACM Transactions on Storage (TOS)
Discovery of application workloads from network file traces

FAST'10 Proceedings of the 8th USENIX conference on File and storage technologies

Robust benchmarking for archival storage tiers

Proceedings of the sixth workshop on Parallel Data Storage

Quantified Score

Hi-index	0.00

Visualization

Abstract

Identifying groups of blocks that tend to be read or written together in a given environment is the first step towards powerful techniques for device failure isolation and power management. For example, identified groups can be placed together on a single disk, avoiding excess drive activity across an exascale storage system. Unlike previous grouping work, we focus on identifying groupings in data that can be gathered from real, running systems with minimal impact. Using temporal, spatial, and access ordering information from an enterprise data set, we identified a set of groupings that consistently appear, indicating that these are working sets that are likely to be accessed together. We present several techniques to obtain groupings along with a discussion of what techniques best apply to particular types of real systems. We intend to use these preliminary results to inform our search for new types of workloads with a goal of identifying properties of easily separable workloads across different systems and dynamically moving groups in these workloads to reduce disk activity in large storage systems.