Using machine learning techniques to enhance the performance of an automatic backup and recovery system

Authors:
Avichai Giat;Dan Pelleg;Eran Raichstein;Amir Ronen
Affiliations:
IBM, Haifa Research Lab, Haifa University Campus, Mount Carmel, Haifa, Israel;Haifa Research Lab, Haifa University Campus, Mount Carmel, Haifa, Israel;IBM Software Group, Matam, Haifa, Israel;Haifa Research Lab, Haifa University Campus, Mount Carmel, Haifa, Israel
Venue:
Proceedings of the 3rd Annual Haifa Experimental Systems Conference
Year:
2010

Citing 13
Cited 0

Prefetching in realtime database applications

SIGMOD '86 Proceedings of the 1986 ACM SIGMOD international conference on Management of data
A multi-threaded architecture for prefetching in object bases

EDBT '94 Proceedings of the 4th international conference on extending database technology: Advances in database technology
Informed prefetching and caching

SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Empirical results on locality in database referencing

SIGMETRICS '85 Proceedings of the 1985 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Fido: A Cache That Learns to Fetch

VLDB '91 Proceedings of the 17th International Conference on Very Large Data Bases
Prefetch Support Relations in Object Bases

Proceedings of the Sixth International Workshop on Persistent Object Systems
C-Miner: Mining Block Correlations in Storage Systems

FAST '04 Proceedings of the 3rd USENIX Conference on File and Storage Technologies
SARC: sequential prefetching in adaptive replacement cache

ATEC '05 Proceedings of the annual conference on USENIX Annual Technical Conference
AMP: adaptive multi-stream prefetching in a shared cache

FAST '07 Proceedings of the 5th USENIX conference on File and Storage Technologies
An analytical approach to file prefetching

ATEC '97 Proceedings of the annual conference on USENIX Annual Technical Conference
STEP: Sequentiality and Thrashing Detection Based Prefetching to Improve Performance of Networked Storage Servers

ICDCS '07 Proceedings of the 27th International Conference on Distributed Computing Systems
Frequent pattern mining: current status and future directions

Data Mining and Knowledge Discovery
On the design of a new Linux readahead framework

ACM SIGOPS Operating Systems Review - Research and developments in the Linux kernel

Quantified Score

Hi-index	0.00

Visualization

Abstract

A typical disaster recovery system will have mirrored storage at a site that is geographically separate from the main operational site. In many cases, communication between the local site and the backup repository site is performed over a network which is inherently slow, such as a WAN, or is highly strained, for example due to a whole-site disaster recovery operation. The goal of this work is to alleviate the performance impact of the network in such a scenario, and to do so using machine learning techniques. We focus on two main areas, prefetching and read-ahead size determination. In both cases we significantly improve the performance of the system. Our main contributions are as follows: We introduce a theoretical model of the system and the problem we are trying to solve and bound the gain from prefetching techniques. We construct two frequent pattern mining algorithms and use them for prefetching. A framework for controlling and combining multiple prefetch algorithms is presented as well. These algorithms, as well as various simple prefetch algorithms, are compared on a simulation environment. We introduce a novel algorithm for determining the amount of read ahead on such a system that is based on intuition from online competitive analysis and on regression techniques. The significant positive impact of this algorithm is demonstrated on IBM's FastBack system. Much of our improvements have been applied with little or no modification of the current implementation's internals. We therefore feel confident in stating that the techniques are general and are likely to have applications elsewhere.