C-Miner: Mining Block Correlations in Storage Systems

Authors:
Zhenmin Li;Zhifeng Chen;Sudarshan M. Srinivasan;Yuanyuan Zhou
Affiliations:
University of Illinois at Urbana-Champaign;University of Illinois at Urbana-Champaign;University of Illinois at Urbana-Champaign;University of Illinois at Urbana-Champaign
Venue:
FAST '04 Proceedings of the 3rd USENIX Conference on File and Storage Technologies
Year:
2004

Citing 48
Cited 27

Prefetching in realtime database applications

SIGMOD '86 Proceedings of the 1986 ACM SIGMOD international conference on Management of data
Optimal prefetching via data compression (extended abstract)

SFCS '91 Proceedings of the 32nd annual symposium on Foundations of computer science
A modeling study of the TPC-C benchmark

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Practical prefetching via data compression

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
A multi-threaded architecture for prefetching in object bases

EDBT '94 Proceedings of the 4th international conference on extending database technology: Advances in database technology
Intelligent file hoarding for mobile computers

MobiCom '95 Proceedings of the 1st annual international conference on Mobile computing and networking
A study of integrated prefetching and caching strategies

Proceedings of the 1995 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Informed prefetching and caching

SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
The HP AutoRAID hierarchical storage system

SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Prefetching in segmented disk cache for multi-disk systems

Proceedings of the fourth workshop on I/O in parallel and distributed systems: part of the federated computing research conference
Petal: distributed virtual disks

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Automatic compiler-inserted I/O prefetching for out-of-core applications

OSDI '96 Proceedings of the second USENIX symposium on Operating systems design and implementation
A trace-driven comparison of algorithms for parallel prefetching and caching

OSDI '96 Proceedings of the second USENIX symposium on Operating systems design and implementation
Informed multi-process prefetching and caching

SIGMETRICS '97 Proceedings of the 1997 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Automated hoarding for mobile computers

Proceedings of the sixteenth ACM symposium on Operating systems principles
Memory system characterization of commercial workloads

Proceedings of the 25th annual international symposium on Computer architecture
A cost-effective, high-bandwidth storage architecture

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Automatic I/O hint generation through speculative execution

OSDI '99 Proceedings of the third symposium on Operating systems design and implementation
Using path profiles to predict HTTP requests

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Sequentiality and prefetching in database systems

ACM Transactions on Database Systems (TODS)
Informed prefetching of collective input/output requests

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
A cost-benefit scheme for high performance predictive prefetching

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Towards application/file-level characterization of block references: a case for fine-grained buffer management

Proceedings of the 2000 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Data mining: concepts and techniques

Data mining: concepts and techniques
SPADE: an efficient algorithm for mining frequent sequences

Machine Learning
Compiler-based I/O prefetching for out-of-core applications

ACM Transactions on Computer Systems (TOCS)
Information and control in gray-box systems

SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
Mining Sequential Patterns

ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
PrefixSpan: Mining Sequential Patterns by Prefix-Projected Growth

Proceedings of the 17th International Conference on Data Engineering
Fido: A Cache That Learns to Fetch

VLDB '91 Proceedings of the 17th International Conference on Very Large Data Bases
My Cache or Yours? Making Storage More Exclusive

ATEC '02 Proceedings of the General Track of the annual conference on USENIX Annual Technical Conference
The Multi-Queue Replacement Algorithm for Second Level Buffer Caches

Proceedings of the General Track: 2002 USENIX Annual Technical Conference
Prefetch Support Relations in Object Bases

Proceedings of the Sixth International Workshop on Persistent Object Systems
Sequential PAttern mining using a bitmap representation

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Conserving disk energy in network servers

ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Decision-Support Workload Characteristics on a Clustered Database Server from the OS Perspective

ICDCS '03 Proceedings of the 23rd International Conference on Distributed Computing Systems
Data Mining Meets Performance Evaluation: Fast Algorithms for Modeling Bursty Traffic

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Hippodrome: Running Circles Around Storage Administration

FAST '02 Proceedings of the 1st USENIX Conference on File and Storage Technologies
Awarded Best Student Paper! - Track-Aligned Extents: Matching Access Patterns to Disk Drive Characteristics

FAST '02 Proceedings of the 1st USENIX Conference on File and Storage Technologies
Semantically-Smart Disk Systems

FAST '03 Proceedings of the 2nd USENIX Conference on File and Storage Technologies
ARC: A Self-Tuning, Low Overhead Replacement Cache

FAST '03 Proceedings of the 2nd USENIX Conference on File and Storage Technologies
Automating data dependability

EW 10 Proceedings of the 10th workshop on ACM SIGOPS European workshop
A low-overhead high-performance unified buffer management scheme that exploits sequential and looping references

OSDI'00 Proceedings of the 4th conference on Symposium on Operating System Design & Implementation - Volume 4
Mining longest repeating subsequences to predict world wide web surfing

USITS'99 Proceedings of the 2nd conference on USENIX Symposium on Internet Technologies and Systems - Volume 2
Data mining approaches for intrusion detection

SSYM'98 Proceedings of the 7th conference on USENIX Security Symposium - Volume 7
Predicting file system actions from prior events

ATEC '96 Proceedings of the 1996 annual conference on USENIX Annual Technical Conference
An analytical approach to file prefetching

ATEC '97 Proceedings of the annual conference on USENIX Annual Technical Conference
A multi-version cache replacement and prefetching policy for hybrid data delivery environments

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases

CP-Miner: Finding Copy-Paste and Related Bugs in Large-Scale Software Code

IEEE Transactions on Software Engineering
Extracting redundancy-aware top-k patterns

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Intelligent storage: Cross-layer optimization for soft real-time workload

ACM Transactions on Storage (TOS)
CP-Miner: a tool for finding copy-paste and related bugs in operating system code

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Frequent Closed Sequence Mining without Candidate Maintenance

IEEE Transactions on Knowledge and Data Engineering
Frequent pattern mining for kernel trace data

Proceedings of the 2008 ACM symposium on Applied computing
DiskSeen: exploiting disk layout and access history to enhance I/O prefetch

ATC'07 2007 USENIX Annual Technical Conference on Proceedings of the USENIX Annual Technical Conference
FARMER: a novel approach to file access correlation mining and evaluation reference model for optimizing peta-scale file system performance

HPDC '08 Proceedings of the 17th international symposium on High performance distributed computing
Context-aware prefetching at the storage server

ATC'08 USENIX 2008 Annual Technical Conference on Annual Technical Conference
CONTOUR: an efficient algorithm for discovering discriminating subsequences

Data Mining and Knowledge Discovery
Autograph: automatically extracting workflow file signatures

ACM SIGOPS Operating Systems Review
Memory resource allocation for file system prefetching: from a supply chain management perspective

Proceedings of the 4th ACM European conference on Computer systems
BORG: block-reORGanization for self-optimizing storage systems

FAST '09 Proccedings of the 7th conference on File and storage technologies
WorkOut: I/O workload outsourcing for boosting RAID reconstruction performance

FAST '09 Proccedings of the 7th conference on File and storage technologies
DHIS: discriminating hierarchical storage

SYSTOR '09 Proceedings of SYSTOR 2009: The Israeli Experimental Systems Conference
Understanding intrinsic characteristics and system implications of flash memory based solid state drives

Proceedings of the eleventh international joint conference on Measurement and modeling of computer systems
Z-miner: noise filter in mining frequent access patterns

ICACT'09 Proceedings of the 11th international conference on Advanced Communication Technology - Volume 1
Mining infrequently-accessed file correlations in distributed file system

APWeb/WAIM'07 Proceedings of the joint 9th Asia-Pacific web and 8th international conference on web-age information management conference on Advances in data and web management
Using machine learning techniques to enhance the performance of an automatic backup and recovery system

Proceedings of the 3rd Annual Haifa Experimental Systems Conference
Tolerating file-system mistakes with EnvyFS

USENIX'09 Proceedings of the 2009 conference on USENIX Annual technical conference
A scheduling framework that makes any disk schedulers non-work-conserving solely based on request characteristics

FAST'11 Proceedings of the 9th USENIX conference on File and stroage technologies
Management of Multilevel, Multiclient Cache Hierarchies with Application Hints

ACM Transactions on Computer Systems (TOCS)
Efficiently identifying working sets in block I/O streams

Proceedings of the 4th Annual International Conference on Systems and Storage
Archiving the web using page changes patterns: a case study

Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries
Efficient Mining of Gap-Constrained Subsequences and Its Various Applications

ACM Transactions on Knowledge Discovery from Data (TKDD)
Debugging embedded multimedia application traces through periodic pattern mining

Proceedings of the tenth ACM international conference on Embedded software
A Prefetching Scheme Exploiting both Data Layout and Access History on Disk

ACM Transactions on Storage (TOS)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Block correlations are common semantic patterns in storage systems. These correlations can be exploited for improving the effectiveness of storage caching, prefetching, data layout and disk scheduling. Unfortunately, information about block correlations is not available at the storage system level. Previous approaches for discovering file correlations in file systems do not scale well enough to be used for discovering block correlations in storage systems. In this paper, we propose C-Miner, an algorithm which uses a data mining technique called frequent sequence mining to discover block correlations in storage systems. C-Miner runs reasonably fast with feasible space requirement, indicating that it is a practical tool for dynamically inferring correlations in a storage system. Moreover, we have also evaluated the benefits of block correlation-directed prefetching and data layout through experiments. Our results using real system workloads show that correlation-directed prefetching and data layout can reduce average I/O response time by 12-25% compared to the base case, and 7-20% compared to the commonly used sequential prefetching scheme.