Scalable Fault-Tolerant Aggregation in Large Process Groups

  • Authors:
  • Indranil Gupta;Robbert van Renesse;Kenneth P. Birman

  • Affiliations:
  • -;-;-

  • Venue:
  • DSN '01 Proceedings of the 2001 International Conference on Dependable Systems and Networks (formerly: FTCS)
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

Abstract: This paper discusses fault-tolerant, scalable solutions to the problem of accurately and scalably calculating global aggregate functions in large process groups communicating over unreliable networks. These groups could represent sensors or processes communicating over a network that is either fixed (e.g., the Internet) or dynamic (eg., multihop ad-hoc). Group members are prone to failures. The ability to evaluate global aggregate properties (eg., the average of sensor temperature readings) is important for higher-level coordination activities in such large groups. We first define the setting and problem, laying down metrics to evaluate different algorithms for the same. We discuss why the usual approaches to solve this problem are unviable and unscalable over an unreliable network prone to message delivery failures and crash failures. We then propose a technique to impose an abstract hierarchy on such large groups, describing how this hierarchy can be made to mirror the network topology. We discuss several alternatives to use this technique to solve the global aggregate function evaluation problem. Finally, we present a protocol based on gossiping that uses this hierarchical technique. We present mathematical analysis and performance results to validate the robustness, efficiency and accuracy of the Hierarchical Gossiping algorithm.