Evaluating the Impact of Communication Architecture on the Performability of Cluster-Based Services

  • Authors:
  • Kiran Nagaraja;Neeraj Krishnan;Ricardo Bianchini;Richard P. Martin;Thu D. Nguyen

  • Affiliations:
  • -;-;-;-;-

  • Venue:
  • HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

We consider the impact of different communication architectures on the performability (performance + availability) of cluster-based servers.In particular, we use a combination offault-injection experiments and analytic modeling to evaluate the performability of two popular communication protocols, TCP and VIA, as the intra-cluster communication substrate of a sophisticated Web server. Our analysis leads to several interesting conclusions, the most surprising of which is, under the same fault load, VIA-based servers deliver greater availability than TCP-based servers. If we assume higher fault rates for VIA-based servers because the underlying technology is more immature and programming model more complex, we find that packet errors or application faults would have to occur at approximately 4 times the rate inTCP-based servers before their performabilities equalize. We use our results from the study to suggest that high-performance and robust communication layers for highly available cluster-based servers should preserve message boundaries, as opposed to using byte streams, use single-copy transfers, pre-allocate channel resources, and report errors in manner consistent with the network fabric's fault model.