Reliability algorithms for network swapping systems with page migration

  • Authors:
  • B. Mitchell;J. Rosse;T. Newhall

  • Affiliations:
  • Swarthmore Coll., PA, USA;Swarthmore Coll., PA, USA;Swarthmore Coll., PA, USA

  • Venue:
  • CLUSTER '04 Proceedings of the 2004 IEEE International Conference on Cluster Computing
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

Summary form only given. Network swapping systems allow individual cluster nodes with over-committed memory to use the idle memory of remote nodes as their backing store, and to swap pages over the network. Without reliability support a single node crash can affect programs running on other nodes by losing their remotely swapped page data. RAID-based (Patterson et al., 1988; Markatos and Dramitinos, 1996) reliability solutions promise the best alternative in terms of flexibility and performance. However, two important features of our network swapping system, Nswap (Newhall et al., 2003), make direct application of RAID-based schemes impossible. First, Nswap adapts to each node's local memory load, adjusting the amount of RAM space it makes available for remote swapping, which results in a variable capacity "backing store". Second, Nswap supports migration of remotely swapped pages between cluster nodes, which occurs when a node needs to reclaim some of its RAM from Nswap to use for local processing. Page migration complicates reliability if, for example, two pages in the same parity group end up on the same node. We present novel reliability algorithms that solve these problems. Our Parity algorithm uses dynamic parity group membership to match Nswap's dynamic nature. We show that our algorithms add minimal overhead to remote swapping.