Performance implications of remote-only load balancing under adversarial traffic in Dragonflies

  • Authors:
  • Bogdan Prisacari;German Rodriguez;Marina Garcia;Enrique Vallejo;Ramon Beivide;Cyriel Minkenberg

  • Affiliations:
  • IBM Research -- Zurich, Ruschlikon, Switzerland;IBM Research -- Zurich, Ruschlikon, Switzerland;IBM Research -- Zurich, Ruschlikon, Switzerland;University of Cantabria, Santander, Spain;University of Cantabria, Santander, Spain;IBM Research -- Zurich, Ruschlikon, Switzerland

  • Venue:
  • Proceedings of the 8th International Workshop on Interconnection Network Architecture: On-Chip, Multi-Chip
  • Year:
  • 2014

Quantified Score

Hi-index 0.00

Visualization

Abstract

Dragonfly topologies are recent network designs that are considered one of the most promising interconnect options for Exascale systems. They offer a low diameter and low network cost, but do so at the expense of path diversity, which makes them vulnerable to certain adversarial traffic patterns. Indirect routing approaches can alleviate the performance degradation that these workloads experience. However, there are limits to the improvements that can be achieved using the indirect routing approach that is popular today, limits that are inherent to the Dragonfly topological structure. In this work, we explore these limits by providing a theoretical justification to why adversarial traffic patterns routed indirectly with an algorithm that perfectly distributes load across inter-Dragonfly-group links can still induce significant bottlenecks in the intra-group links. We equally provide estimations of the performance impact of these imbalances, as well as present a set of simulation based benchmarks that confirm the theoretical predictions for practical Dragonfly systems.