SimJAVA—a framework for modeling queueing networks in Java
Proceedings of the 29th conference on Winter simulation
The implementation of the Cilk-5 multithreaded language
PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Condor: a distributed job scheduler
Beowulf cluster computing with Linux
MPI: The Complete Reference
The portable batch scheduler and the maui scheduler on linux clusters
ALS'00 Proceedings of the 4th annual Linux Showcase & Conference - Volume 4
Falkon: a Fast and Light-weight tasK executiON framework
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Overview of the IBM Blue Gene/P project
IBM Journal of Research and Development
SimGrid: A Generic Framework for Large-Scale Distributed Experiments
UKSIM '08 Proceedings of the Tenth International Conference on Computer Modeling and Simulation
Scheduling multithreaded computations by work stealing
SFCS '94 Proceedings of the 35th Annual Symposium on Foundations of Computer Science
Toward loosely coupled programming on petascale systems
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
The quest for scalable support of data-intensive workloads in distributed systems
Proceedings of the 18th ACM international symposium on High performance distributed computing
Introduction to Algorithms, Third Edition
Introduction to Algorithms, Third Edition
Many-task computing: bridging the gap between high-throughput computing and high-performance computing
Middleware support for many-task computing
Cluster Computing
Making a case for distributed file systems at Exascale
Proceedings of the third international workshop on Large-scale system and application performance
Opportunities and Challenges in Running Scientific Workflows on the Cloud
CYBERC '11 Proceedings of the 2011 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery
Experiences in running workloads over grid3
GCC'05 Proceedings of the 4th international conference on Grid and Cooperative Computing
Parallel Simulation of Peer-to-Peer Systems
CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Exploring reliability of exascale systems through simulations
Proceedings of the High Performance Computing Symposium
Using simulation to explore distributed key-value stores for extreme-scale system services
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Hi-index | 0.00 |
Exascale computers (expected to be composed of millions of nodes and billions of threads of execution) will enable the unraveling of significant scientific mysteries. Many-task computing is a distributed paradigm, which can potentially address three of the four major challenges of exascale computing, namely Memory/Storage, Concurrency/Locality, and Resiliency. Exascale computing will require efficient job scheduling/management systems that are several orders of magnitude beyond the state-of-the-art, which tend to have centralized architecture and are relatively heavy-weight. This paper proposes a light-weight discrete event simulator, SimMatrix, which simulates job scheduling system comprising of millions of nodes and billions of cores/tasks. SimMatrix supports both centralized (e.g. first-in-first-out) and distributed (e.g. work stealing) scheduling. We validated SimMatrix against two real systems, Falkon and MATRIX, with up to 4K-cores, running on an IBM Blue Gene/P system, and compared SimMatrix with SimGrid and GridSim in terms of resource consumption at scale. Results show that SimMatrix consumes up to two-orders of magnitude lower memory per task, and at least one-order of magnitude (and up to four-orders of magnitude) lower time per task overheads. For example, running a workload of 10 billion tasks on 1 million nodes and 1 billion cores required 142GB memory and 163 CPU-hours. These relatively low costs at exascale levels of concurrency will lead to innovative studies in scheduling algorithms at unprecedented scales.