A fast topology inference: a building block for network-aware parallel processing

  • Authors:
  • Tatsuya Shirai;Hideo Saito;Kenjiro Taura

  • Affiliations:
  • University of Tokyo;University of Tokyo;University of Tokyo

  • Venue:
  • Proceedings of the 16th international symposium on High performance distributed computing
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Adapting to the network is the key to achieving high performance for communication-intensive applications, including scientific computing,data intensive computing, and multicast, especially in Grid environments. This paper investigates an approach of representing network as a tree of participating hosts and switches matching or approximating their physical topology, and describes a fast, non-intrusive, and portable algorithm for inferring such a topology. This representation and the proposed inference algorithm serves as a key to building network-aware applications in a portable manner. The algorithm is based solely on RTTs of small packets between end hosts; it does not rely on popular but not universally available protocols such as trace route and SNMP. Another benefit is that it can handle all layers of network uniformly without any a priori knowledge of cluster configurations. The required number of measurements is O(Nd) in certain idealizing assumptions made for the purpose of analysis, where N is the number of participating processes and d the diameter of the network, which is usually small in real networks. In our experimental environment, the inference algorithm built a topology of 64 hosts in a single cluster in 4 seconds and and that of 256 hosts across 4 clusters in 15 seconds. It is able to not only identify clusters within a Grid, but also to partially identify the Layer 2 topology within a cluster. This is important for optimizing bandwidth-limited operations such as broadcast. We built several network-aware applications upon the inference system, including efficient bandwidth measurements and long message broadcasts. The topology is used to schedule as many measurements as possible in parallel without competing on shared links. We were able to build a bandwidth map of 256 hosts across 4 clusters in 27 seconds.