A lightweight, high performance communication protocol for grid computing

  • Authors:
  • Phillip M. Dickens

  • Affiliations:
  • Department of Computer Science, University of Maine, Orono, USA

  • Venue:
  • Cluster Computing
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper describes a lightweight, high-performance communication protocol for the high-bandwidth, high-delay networks typical of computational Grids. One unique feature of this protocol is that it incorporates an extremely accurate classification mechanism that is efficient enough to diagnose the cause of data loss in real time, providing to the controller the opportunity to respond to different causes of data loss in different ways. The simplest adaptive response, and the one discussed in this paper, is to trigger aggressive congestion control measures only when the data loss is diagnosed as network related. However, even this very simple adaptation can have a tremendous impact on performance in a Grid setting where the resources allocated to a long-running, data-intensive application can fluctuate significantly during the course of its execution. In fact, we provide results showing that the utilization of the information provided by the classifier increased performance by over two orders of magnitude depending on the dominant cause of data loss. In this paper, we discuss the Bayesian statistical framework upon which the classifier is based and the classification metrics that make this approach highly successful. We discuss the integration of the classifier into the congestion control structures of an existing high-performance communication protocol, and provide empirical results showing that it correctly diagnosed the cause of data loss in over 98% of the experimental trials.