Optimization and analysis of distributed averaging with short node memory

  • Authors:
  • Boris N. Oreshkin;Mark J. Coates;Michael G. Rabbat

  • Affiliations:
  • Telecommunications and Signal Processing-Computer Networks Laboratory Department of Electrical and Computer Engineering, McGill University, Montreal, QC, Canada;Telecommunications and Signal Processing-Computer Networks Laboratory Department of Electrical and Computer Engineering, McGill University, Montreal, QC, Canada;Telecommunications and Signal Processing-Computer Networks Laboratory Department of Electrical and Computer Engineering, McGill University, Montreal, QC, Canada

  • Venue:
  • IEEE Transactions on Signal Processing
  • Year:
  • 2010

Quantified Score

Hi-index 35.69

Visualization

Abstract

Distributed averaging describes a class of network algorithms for the decentralized computation of aggregate statistics. Initially, each node has a scalar data value, and the goal is to compute the average of these values at every node (the so-called average consensus problem). Nodes iteratively exchange information with their neighbors and perform local updates until the value at every node converges to the initial network average. Much previous work has focused on algorithms where each node maintains and updates a single value; every time an update is performed, the previous value is forgotten. Convergence to the average consensus is achieved asymptotically. The convergence rate is fundamentally limited by network connectivity, and it can be prohibitively slow on topologies such as grids and random geometric graphs, even if the update rules are optimized. In this paper, we provide the first theoretical demonstration that adding a local prediction component to the update rule can significantly improve the convergence rate of distributed averaging algorithms. We focus on the case where the local predictor is a linear combination of the node's current and previous values (i.e., two memory taps), and our update rule computes a combination of the predictor and the usual weighted linear combination of values received from neighboring nodes. We derive the optimal mixing parameter for combining the predictor with the neighbors' values, and conduct a theoretical analysis of the improvement in convergence rate that can be achieved using this acceleration methodology. For a chain topology on N nodes, this leads to a factor of N improvement over standard consensus, and for a two-dimensional grid, our approach achieves a factor of √N improvement.