Reliable, scalable tree-based overlay networks

  • Authors:
  • Barton P. Miller;Dorian Cecil Arnold

  • Affiliations:
  • The University of Wisconsin - Madison;The University of Wisconsin - Madison

  • Venue:
  • Reliable, scalable tree-based overlay networks
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

As high performance computing (HPC) systems continue to increase in size, reliable and scalable computational models become critical. Tree-based overlay networks (TBŌNs) help to address scalability by providing scalable data multicast, data gather, and data aggregation services. In this dissertation, we address the reliability challenges of TBŌN-based HPC tools and applications. We exploit the characteristics of many TBŌN computations to develop a new failure recovery model, state compensation, that uses: (1) inherently redundant information from processes that survive failures to compensate for information lost due to failures, (2) weak data consistency to relax the constraints of the recovery mechanisms, and (3) protocols that allow processes to recover from failures independently. State compensation requires no additional computational, network or storage resources in the absence of failures. When failures do occur, a small subset of TBŌN processes participate in failure recovery, so failure recovery is scalable. We developed a formal specification of our data aggregation model that allowed us to validate our failure recovery mechanisms and identify their requirements and limitations. Generally, state compensation requires that data aggregation operations be commutative and associative. Our primary compensation mechanism requires that the data aggregation operation be idempotent. Our second compensation mechanism addresses non-idempotent data aggregation operations using more complex recovery mechanisms. We studied tree reconfiguration algorithms for high performance TBŌNs, focusing on the algorithms' execution times and the costs of managing the TBŌN process information needed by the reconfiguration algorithms. We also considered the data aggregation latency of the resulting configurations, and concluded that this should be the primary consideration for TBŌNs with up to one million application processes. We recommend an algorithm that considers all TBŌN processes but restricts increases in tree height, since height increases can have a significant negative impact on data aggregation performance. Also, we implemented our primary state compensation mechanisms. Our experiments with this framework confirm that for TBŌNs that can support millions of application processes, state compensation can yield low failure recovery latencies and inconsequential application perturbation.