Improved point-to-point and collective communication performance with output-queued high-radix routers

  • Authors:
  • Sameer Kumar;Craig Stunkel;Laxmikant V. Kalé

  • Affiliations:
  • IBM T.J. Watson Research Center, Yorktown Heights, NY;IBM T.J. Watson Research Center, Yorktown Heights, NY;Department of Computer Science, University of Illinois at Urbana-Champaign

  • Venue:
  • HiPC'05 Proceedings of the 12th international conference on High Performance Computing
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present an output-queued switch architecture with cross-point buffering that has improved performance for both point-to-point communication and hardware accelerated collective communication. In the past, output queuing architectures have been less popular as they require more internal speedup and buffering. However, with current technology it is possible to build output-queued switches with a relatively large number of ports. We demonstrate that our output-queued architecture performs well for point-to-point messages, specially in a fat-tree topology. We also show that output-queued architectures facilitate efficient implementations of multicasts and reductions. We present performance of multicasts and reductions on individual switches and a network of switches interconnected in a fat-tree topology. We also present simulation results based on synthetic workloads that emulate a molecular dynamics application.