A bridging model for parallel computation
Communications of the ACM
A closer look at coscheduling approaches for a network of workstations
Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures
IEEE Transactions on Parallel and Distributed Systems
Implicit coscheduling: coordinated scheduling with implicit information in distributed systems
ACM Transactions on Computer Systems (TOCS)
Parallel Computer Architecture: A Hardware/Software Approach
Parallel Computer Architecture: A Hardware/Software Approach
Predictive performance and scalability modeling of a large-scale application
Proceedings of the 2001 ACM/IEEE conference on Supercomputing
Informing Algorithms for Efficient Scheduling of Synchronizing Threads on Multiprogrammed SMPs
ICPP '02 Proceedings of the 2001 International Conference on Parallel Processing
CANPC '98 Proceedings of the Second International Workshop on Network-Based Parallel Computing: Communication, Architecture, and Applications
Implications of I/O for Gang Scheduled Workloads
IPPS '97 Proceedings of the Job Scheduling Strategies for Parallel Processing
Metrics and Benchmarking for Parallel Job Scheduling
IPPS/SPDP '98 Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing
Dynamic Coscheduling on Workstation Clusters
IPPS/SPDP '98 Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing
Selective Reservation Strategies for Backfill Job Scheduling
JSSPP '02 Revised Papers from the 8th International Workshop on Job Scheduling Strategies for Parallel Processing
STORM: lightning-fast resource management
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Adaptive scheduling under memory constraints on non-dedicated computational farms
Future Generation Computer Systems - Selected papers from CCGRID 2002
Scalability Analysis of Multidimensional Wavefront Algorithms on Large-Scale SMP Clusters
FRONTIERS '99 Proceedings of the The 7th Symposium on the Frontiers of Massively Parallel Computation
A Comparative Evaluation of Implicit Coscheduling Strategies for Networks of Workstations
HPDC '00 Proceedings of the 9th IEEE International Symposium on High Performance Distributed Computing
IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
Selective Preemption Strategies for Parallel Job Scheduling
ICPP '02 Proceedings of the 2002 International Conference on Parallel Processing
IEEE Transactions on Parallel and Distributed Systems
The workload on parallel supercomputers: modeling the characteristics of rigid jobs
Journal of Parallel and Distributed Computing
Coscheduling in Clusters: Is It a Viable Alternative?
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Scheduling Tradeoffs for Heterogeneous Computing on an Advanced Space Processing Platform
ICPADS '06 Proceedings of the 12th International Conference on Parallel and Distributed Systems - Volume 2
STORM: Scalable Resource Management for Large-Scale Parallel Computers
IEEE Transactions on Computers
A runtime resolution scheme for priority boost conflict in implicit coscheduling
The Journal of Supercomputing
Cooperating coscheduling: a coscheduling proposal aimed at non-dedicated heterongeneous NOWs
Journal of Computer Science and Technology
Coscheduled distributed-Web servers on system area network
Journal of Parallel and Distributed Computing
New challenges of parallel job scheduling
JSSPP'07 Proceedings of the 13th international conference on Job scheduling strategies for parallel processing
An approach to resource-aware co-scheduling for CMPs
Proceedings of the 24th ACM International Conference on Supercomputing
Using inaccurate estimates accurately
JSSPP'10 Proceedings of the 15th international conference on Job scheduling strategies for parallel processing
A network performance sensitivity metric for parallel applications
International Journal of High Performance Computing and Networking
Pitfalls in parallel job scheduling evaluation
JSSPP'05 Proceedings of the 11th international conference on Job Scheduling Strategies for Parallel Processing
Static and dynamic job scheduling with communication aware policy in cluster computing
Computers and Electrical Engineering
Scheduling optimization in multicore multithreaded microprocessors through dynamic modeling
Proceedings of the ACM International Conference on Computing Frontiers
Reducing the energy cost of computing through efficient co-scheduling of parallel workloads
DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
Hi-index | 0.00 |
Many scientific and high-performance computing applications consist of multiple processes running on different processors that communicate frequently. Because of their synchronization needs, these applications can suffer severe performance penalties if their processes are not all coscheduled to run together. Two common approaches to coscheduling jobs are batch scheduling, wherein nodes are dedicated for the duration of the run, and gang scheduling, wherein time slicing is coordinated across processors. Both work well when jobs are load-balanced and make use of the entire parallel machine. However, these conditions are rarely met and most realistic workloads consequently suffer from both internal and external fragmentation, in which resources and processors are left idle because jobs cannot be packed with perfect efficiency. This situation leads to reduced utilization and suboptimal performance. Flexible CoScheduling (FCS) addresses this problem by monitoring each job's computation granularity and communication pattern and scheduling jobs based on their synchronization and load-balancing requirements. In particular, jobs that do not require stringent synchronization are identified, and are not coscheduled; instead, these processes are used to reduce fragmentation. FCS has been fully implemented on top of the STORM resource manager on a 256-processor Alpha cluster and compared to batch, gang, and implicit coscheduling algorithms. This paper describes in detail the implementation of FCS and its performance evaluation with a variety of workloads, including large-scale benchmarks, scientific applications, and dynamic workloads. The experimental results show that FCS saturates at higher loads than other algorithms (up to 54 percent higher in some cases), and displays lower response times and slowdown than the other algorithms in nearly all scenarios.