Mining block correlations to improve storage performance

Authors:
Zhenmin Li;Zhifeng Chen;Yuanyuan Zhou
Affiliations:
University of Illinois at Urbana-Champaign, Urbana, IL;University of Illinois at Urbana-Champaign, Urbana, IL;University of Illinois at Urbana-Champaign, Urbana, IL
Venue:
ACM Transactions on Storage (TOS)
Year:
2005

Citing 51
Cited 12

Prefetching in realtime database applications

SIGMOD '86 Proceedings of the 1986 ACM SIGMOD international conference on Management of data
Optimal prefetching via data compression (extended abstract)

SFCS '91 Proceedings of the 32nd annual symposium on Foundations of computer science
A modeling study of the TPC-C benchmark

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Practical prefetching via data compression

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
A multi-threaded architecture for prefetching in object bases

EDBT '94 Proceedings of the 4th international conference on extending database technology: Advances in database technology
Intelligent file hoarding for mobile computers

MobiCom '95 Proceedings of the 1st annual international conference on Mobile computing and networking
A study of integrated prefetching and caching strategies

Proceedings of the 1995 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Informed prefetching and caching

SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
The HP AutoRAID hierarchical storage system

SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Prefetching in segmented disk cache for multi-disk systems

Proceedings of the fourth workshop on I/O in parallel and distributed systems: part of the federated computing research conference
Petal: distributed virtual disks

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Automatic compiler-inserted I/O prefetching for out-of-core applications

OSDI '96 Proceedings of the second USENIX symposium on Operating systems design and implementation
A trace-driven comparison of algorithms for parallel prefetching and caching

OSDI '96 Proceedings of the second USENIX symposium on Operating systems design and implementation
Informed multi-process prefetching and caching

SIGMETRICS '97 Proceedings of the 1997 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Input/output access pattern classification using hidden Markov models

Proceedings of the fifth workshop on I/O in parallel and distributed systems
Automated hoarding for mobile computers

Proceedings of the sixteenth ACM symposium on Operating systems principles
Memory system characterization of commercial workloads

Proceedings of the 25th annual international symposium on Computer architecture
A cost-effective, high-bandwidth storage architecture

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Automatic I/O hint generation through speculative execution

OSDI '99 Proceedings of the third symposium on Operating systems design and implementation
Using path profiles to predict HTTP requests

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Sequentiality and prefetching in database systems

ACM Transactions on Database Systems (TODS)
Informed prefetching of collective input/output requests

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
A cost-benefit scheme for high performance predictive prefetching

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Towards application/file-level characterization of block references: a case for fine-grained buffer management

Proceedings of the 2000 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Data mining: concepts and techniques

Data mining: concepts and techniques
Compiler-based I/O prefetching for out-of-core applications

ACM Transactions on Computer Systems (TOCS)
I/O reference behavior of production database workloads and the TPC benchmarks—an analysis at the logical level

ACM Transactions on Database Systems (TODS)
Information and control in gray-box systems

SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
SPADE: An Efficient Algorithm for Mining Frequent Sequences

Machine Learning
Learning to Classify Parallel Input/Output Access Patterns

IEEE Transactions on Parallel and Distributed Systems
Track-Aligned Extents: Matching Access Patterns to Disk Drive Characteristics

FAST '02 Proceedings of the Conference on File and Storage Technologies
Hippodrome: Running Circles Around Storage Administration

FAST '02 Proceedings of the Conference on File and Storage Technologies
Mining Sequential Patterns

ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
PrefixSpan: Mining Sequential Patterns by Prefix-Projected Growth

Proceedings of the 17th International Conference on Data Engineering
Fido: A Cache That Learns to Fetch

VLDB '91 Proceedings of the 17th International Conference on Very Large Data Bases
My Cache or Yours? Making Storage More Exclusive

ATEC '02 Proceedings of the General Track of the annual conference on USENIX Annual Technical Conference
The Multi-Queue Replacement Algorithm for Second Level Buffer Caches

Proceedings of the General Track: 2002 USENIX Annual Technical Conference
Prefetch Support Relations in Object Bases

Proceedings of the Sixth International Workshop on Persistent Object Systems
Sequential PAttern mining using a bitmap representation

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Conserving disk energy in network servers

ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Decision-Support Workload Characteristics on a Clustered Database Server from the OS Perspective

ICDCS '03 Proceedings of the 23rd International Conference on Distributed Computing Systems
Data Mining Meets Performance Evaluation: Fast Algorithms for Modeling Bursty Traffic

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Semantically-Smart Disk Systems

FAST '03 Proceedings of the 2nd USENIX Conference on File and Storage Technologies
ARC: A Self-Tuning, Low Overhead Replacement Cache

FAST '03 Proceedings of the 2nd USENIX Conference on File and Storage Technologies
Automating data dependability

EW 10 Proceedings of the 10th workshop on ACM SIGOPS European workshop
A low-overhead high-performance unified buffer management scheme that exploits sequential and looping references

OSDI'00 Proceedings of the 4th conference on Symposium on Operating System Design & Implementation - Volume 4
Mining longest repeating subsequences to predict world wide web surfing

USITS'99 Proceedings of the 2nd conference on USENIX Symposium on Internet Technologies and Systems - Volume 2
Data mining approaches for intrusion detection

SSYM'98 Proceedings of the 7th conference on USENIX Security Symposium - Volume 7
Predicting file system actions from prior events

ATEC '96 Proceedings of the 1996 annual conference on USENIX Annual Technical Conference
An analytical approach to file prefetching

ATEC '97 Proceedings of the annual conference on USENIX Annual Technical Conference
A multi-version cache replacement and prefetching policy for hybrid data delivery environments

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases

Tuning file system block addressing for performance

Proceedings of the 44th annual Southeast regional conference
An approach to mining bundled commodities

Knowledge-Based Systems
On the design of a new Linux readahead framework

ACM SIGOPS Operating Systems Review - Research and developments in the Linux kernel
Context-aware prefetching at the storage server

ATC'08 USENIX 2008 Annual Technical Conference on Annual Technical Conference
EED: Energy Efficient Disk drive architecture

Information Sciences: an International Journal
Efficient algorithms for incremental maintenance of closed sequential patterns in large databases

Data & Knowledge Engineering
Exploiting idle CPU cores to improve file access performance

Proceedings of the 3rd International Conference on Ubiquitous Information Management and Communication
Exploiting the performance gains of modern disk drives by enhancing data locality

Information Sciences: an International Journal
Z-miner: noise filter in mining frequent access patterns

ICACT'09 Proceedings of the 11th international conference on Advanced Communication Technology - Volume 1
Mining infrequently-accessed file correlations in distributed file system

APWeb/WAIM'07 Proceedings of the joint 9th Asia-Pacific web and 8th international conference on web-age information management conference on Advances in data and web management
vTube: efficient streaming of virtual appliances over last-mile networks

Proceedings of the 4th annual Symposium on Cloud Computing
Modeling the aging process of flash storage by leveraging semantic I/O

Future Generation Computer Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Block correlations are common semantic patterns in storage systems. They can be exploited for improving the effectiveness of storage caching, prefetching, data layout, and disk scheduling. Unfortunately, information about block correlations is unavailable at the storage system level. Previous approaches for discovering file correlations in file systems do not scale well enough for discovering block correlations in storage systems.In this article, we propose two algorithms, C-Miner and C-Miner*, that use a data mining technique called frequent sequence mining to discover block correlations in storage systems. Both algorithms run reasonably fast with feasible space requirement, indicating that they are practical for dynamically inferring correlations in a storage system. C-Miner is a direct application of a frequent-sequence mining algorithm with a few modifications; compared with C-Miner, C-Miner* is redesigned for mining block correlations by making concessions for the specific problem of long sequences in storage system traces. Therefore, C-Miner* can discover 7--109% more correlation rules within 2--15 times shorter time than C-Miner. Moreover, we have also evaluated the benefits of block correlation-directed prefetching and data layout through experiments. Our results using real system workloads show that correlation-directed prefetching and data layout can reduce average I/O response time by 12--30% compared to the base case, and 7--25% compared to the commonly used sequential prefetching scheme for most workloads.