SimMatrix: SIMulator for MAny-Task computing execution fabRIc at eXascale

  • Authors:
  • Ke Wang;Kevin Brandstatter;Ioan Raicu

  • Affiliations:
  • Illinois Institute of Technology, Chicago, IL;Illinois Institute of Technology, Chicago, IL;Illinois Institute of Technology, Chicago, IL and Argonne National Laboratory, Argonne, IL

  • Venue:
  • Proceedings of the High Performance Computing Symposium
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Exascale computers (expected to be composed of millions of nodes and billions of threads of execution) will enable the unraveling of significant scientific mysteries. Many-task computing is a distributed paradigm, which can potentially address three of the four major challenges of exascale computing, namely Memory/Storage, Concurrency/Locality, and Resiliency. Exascale computing will require efficient job scheduling/management systems that are several orders of magnitude beyond the state-of-the-art, which tend to have centralized architecture and are relatively heavy-weight. This paper proposes a light-weight discrete event simulator, SimMatrix, which simulates job scheduling system comprising of millions of nodes and billions of cores/tasks. SimMatrix supports both centralized (e.g. first-in-first-out) and distributed (e.g. work stealing) scheduling. We validated SimMatrix against two real systems, Falkon and MATRIX, with up to 4K-cores, running on an IBM Blue Gene/P system, and compared SimMatrix with SimGrid and GridSim in terms of resource consumption at scale. Results show that SimMatrix consumes up to two-orders of magnitude lower memory per task, and at least one-order of magnitude (and up to four-orders of magnitude) lower time per task overheads. For example, running a workload of 10 billion tasks on 1 million nodes and 1 billion cores required 142GB memory and 163 CPU-hours. These relatively low costs at exascale levels of concurrency will lead to innovative studies in scheduling algorithms at unprecedented scales.