Memory Management for Scalable Web Data Servers

Authors:
Shivakumar Venkataraman;Miron Livny;Jeffrey F. Naughton
Affiliations:
-;-;-
Venue:
ICDE '97 Proceedings of the Thirteenth International Conference on Data Engineering
Year:
1997

Citing 0
Cited 3

Efficient Memory Page Replacement on Web Server Clusters

ICCS '02 Proceedings of the International Conference on Computational Science-Part III
Vertical Data Migration in Large Near-Line Document Archives Based on Markov-Chain Predictions

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Integrated document caching and prefetching in storage hierarchies based on Markov-chain predictions

The VLDB Journal — The International Journal on Very Large Data Bases

Quantified Score

Hi-index	0.00

Visualization

Abstract

Popular web sites are already experiencing very heavy loads, and these loads will only increase as the number of users accessing them grows. These loads create both CPU and I/O bottlenecks. One promising solution already being employed to eliminate the CPU bottleneck is to replace a single processor server with a cluster of servers. Our goal in this paper is to develop buffer management algorithms that exploit the aggregate memory capacity of the machines in such a server cluster to attack the I/O bottleneck. The key challenge in designing such buffer management algorithms turns out to be controlling data replication so as to achieve a good balance between intra-cluster network traffic and disk I/O. At one extreme, the straightforward application of client-server memory management techniques to this cluster architecture causes duplication in memory among the servers and this tends to reduce network traffic but increases disk I/O, whereas at the other extreme, eliminating all duplicates tends to increase network traffic while reducing disk I/O. Accordingly, we present a new algorithm, Hybrid, that dynamically controls the amount of duplication. Through a detailed simulation, we show that on workloads characteristic of those experienced by Web servers, the Hybrid algorithm correctly trades off intra-cluster network traffic and disk I/O to minimize average response time.