Warp speed: executing time warp on 1,966,080 cores
Proceedings of the 2013 ACM SIGSIM conference on Principles of advanced discrete simulation
Hi-index | 0.00 |
A low-latency and low-diameter interconnection network will be an important component of future exascale architectures. The dragonfly network topology, a two-level directly connected network, is a candidate for exascale architectures because of its low diameter and reduced latency. To date, small-scale simulations with a few thousand nodes have been carried out to examine the dragonfly topology. However, future exascale machines will have millions of cores and up to 1 million nodes. In this paper, we focus on the modeling and simulation of large-scale dragonfly networks using the Rensselaer Optimistic Simulation System (ROSS). We validate the results of our model against the cycle-accurate simulator "booksim". We also compare the performance of booksim and ROSS for the dragonfly network model at modest scales. We demonstrate the performance of ROSS on both the Blue Gene/P and Blue Gene/Q systems on a dragonfly model with up to 50 million nodes, showing a peak event rate of 1.33 billion events/second and a total of 872 billion committed events. The dragonfly network model for million-node configurations strongly scales when going from 1,024 to 65,536 MPI tasks on IBM Blue Gene/P and IBM Blue Gene/Q systems. We also explore a variety of ROSS tuning parameters to get optimal results with the dragonfly network model.