Looking under the hood of the IBM blue gene/Q network

Authors:
Dong Chen;Noel Eisley;Philip Heidelberger;Sameer Kumar;Amith Mamidala;Fabrizio Petrini;Robert Senger;Yutaka Sugawara;Robert Walkup;Burkhard Steinmacher-Burow;Anamitra Choudhury;Yogish Sabharwal;Swati Singhal;Jeffrey J. Parker
Affiliations:
IBM T. J. Watson Research Center, Yorktown Heights, NY;IBM T. J. Watson Research Center, Yorktown Heights, NY;IBM T. J. Watson Research Center, Yorktown Heights, NY;IBM T. J. Watson Research Center, Yorktown Heights, NY;IBM T. J. Watson Research Center, Yorktown Heights, NY;IBM T. J. Watson Research Center, Yorktown Heights, NY;IBM T. J. Watson Research Center, Yorktown Heights, NY;IBM T. J. Watson Research Center, Yorktown Heights, NY;IBM T. J. Watson Research Center, Yorktown Heights, NY;IBM Deutschland Research & Development GmbH, Bööblingen, Germany;IBM India Research Lab, New Delhi, India;IBM India Research Lab, New Delhi, India;IBM India Research Lab, New Delhi, India;IBM Systems &Technology Group, Systems Hardware Development, Rochester, MN
Venue:
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Year:
2012

Citing 14
Cited 2

Adaptive Bubble Router: A Design to Improve Performance in Torus Networks

ICPP '99 Proceedings of the 1999 International Conference on Parallel Processing
The BlackWidow High-Radix Clos Network

Proceedings of the 33rd annual international symposium on Computer Architecture
Overview of the IBM Blue Gene/P project

IBM Journal of Research and Development
Technology-Driven, Highly-Scalable Dragonfly Topology

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Optimization of All-to-All Communication on the Blue Gene/L Supercomputer

ICPP '08 Proceedings of the 2008 37th International Conference on Parallel Processing
HPCC RandomAccess benchmark for next generation supercomputers

IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
Overview of the Blue Gene/L system architecture

IBM Journal of Research and Development
The PERCS High-Performance Interconnect

HOTI '10 Proceedings of the 2010 18th IEEE Symposium on High Performance Interconnects
The Gemini System Interconnect

HOTI '10 Proceedings of the 2010 18th IEEE Symposium on High Performance Interconnects
The IBM Blue Gene/Q interconnection network and message unit

Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
The IBM Blue Gene/Q Compute Chip

IEEE Micro
PAMI: A Parallel Active Message Interface for the Blue Gene/Q Supercomputer

IPDPS '12 Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium
The Tofu Interconnect

IEEE Micro
The IBM Blue Gene/Q Interconnection Fabric

IEEE Micro

Warp speed: executing time warp on 1,966,080 cores

Proceedings of the 2013 ACM SIGSIM conference on Principles of advanced discrete simulation
LEF: long edge first routing for two-dimensional mesh network on chip

Proceedings of the Sixth International Workshop on Network on Chip Architectures

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper explores the performance and optimization of the IBM Blue Gene/Q (BG/Q) five dimensional torus network on up to 16K nodes. The BG/Q hardware supports multiple dynamic routing algorithms and different traffic patterns may require different algorithms to achieve best performance. Between 85% to 95% of peak network performance is achieved for all-to-all traffic, while over 85% of peak is obtained for challenging bisection pairings. A new software-controlled algorithm is developed for bisection traffic that selects which hardware algorithm to employ and achieves better performance than any individual hardware algorithm. The benefit of dynamic routing is shown for a highly non-uniform "transpose" traffic pattern. To evaluate memory and network performance, the HPCC Random Access benchmark was tuned for BG/Q and achieved 858 Giga Updates per Second (GUPS) on 16K nodes. To further accelerate message processing, the message libraries on BG/Q enable the offloading of messaging overhead onto dedicated communication threads. Several applications, including Algebraic Multigrid (AMG), exhibit from 3 to 20% gain using communication threads.