Prefetching in realtime database applications
SIGMOD '86 Proceedings of the 1986 ACM SIGMOD international conference on Management of data
A multi-threaded architecture for prefetching in object bases
EDBT '94 Proceedings of the 4th international conference on extending database technology: Advances in database technology
Informed prefetching and caching
SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Empirical results on locality in database referencing
SIGMETRICS '85 Proceedings of the 1985 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Fido: A Cache That Learns to Fetch
VLDB '91 Proceedings of the 17th International Conference on Very Large Data Bases
Prefetch Support Relations in Object Bases
Proceedings of the Sixth International Workshop on Persistent Object Systems
C-Miner: Mining Block Correlations in Storage Systems
FAST '04 Proceedings of the 3rd USENIX Conference on File and Storage Technologies
SARC: sequential prefetching in adaptive replacement cache
ATEC '05 Proceedings of the annual conference on USENIX Annual Technical Conference
AMP: adaptive multi-stream prefetching in a shared cache
FAST '07 Proceedings of the 5th USENIX conference on File and Storage Technologies
An analytical approach to file prefetching
ATEC '97 Proceedings of the annual conference on USENIX Annual Technical Conference
ICDCS '07 Proceedings of the 27th International Conference on Distributed Computing Systems
Frequent pattern mining: current status and future directions
Data Mining and Knowledge Discovery
On the design of a new Linux readahead framework
ACM SIGOPS Operating Systems Review - Research and developments in the Linux kernel
Hi-index | 0.00 |
A typical disaster recovery system will have mirrored storage at a site that is geographically separate from the main operational site. In many cases, communication between the local site and the backup repository site is performed over a network which is inherently slow, such as a WAN, or is highly strained, for example due to a whole-site disaster recovery operation. The goal of this work is to alleviate the performance impact of the network in such a scenario, and to do so using machine learning techniques. We focus on two main areas, prefetching and read-ahead size determination. In both cases we significantly improve the performance of the system. Our main contributions are as follows: We introduce a theoretical model of the system and the problem we are trying to solve and bound the gain from prefetching techniques. We construct two frequent pattern mining algorithms and use them for prefetching. A framework for controlling and combining multiple prefetch algorithms is presented as well. These algorithms, as well as various simple prefetch algorithms, are compared on a simulation environment. We introduce a novel algorithm for determining the amount of read ahead on such a system that is based on intuition from online competitive analysis and on regression techniques. The significant positive impact of this algorithm is demonstrated on IBM's FastBack system. Much of our improvements have been applied with little or no modification of the current implementation's internals. We therefore feel confident in stating that the techniques are general and are likely to have applications elsewhere.