IBM Systems Journal
A worldwide flock of Condors: load sharing among workstation clusters
Future Generation Computer Systems - Special issue: resource management in distributed systems
Comparing processor allocation strategies in multiprogrammed shared-memory multiprocessors
Journal of Parallel and Distributed Computing
An evaluation of parallel job scheduling for ASCI Blue-Pacific
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Processor Scheduling and Allocation for 3D Torus Multicomputer Systems
IEEE Transactions on Parallel and Distributed Systems
An infrastructure for efficient parallel job execution in Terascale computing environments
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Load Balancing in Parallel Computers: Theory and Practice
Load Balancing in Parallel Computers: Theory and Practice
The ANL/IBM SP Scheduling System
IPPS '95 Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing
Packing Schemes for Gang Scheduling
IPPS '96 Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing
The EASY - LoadLeveler API Project
IPPS '96 Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing
Improved Utilization and Responsiveness with Gang Scheduling
IPPS '97 Proceedings of the Job Scheduling Strategies for Parallel Processing
Improving First-Come-First-Serve Job Scheduling by Gang Scheduling
IPPS/SPDP '98 Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing
Job Re-pacing for Enhancing the Performance of Gang Scheduling
IPPS/SPDP '99/JSSPP '99 Proceedings of the Job Scheduling Strategies for Parallel Processing
Using Run-Time Predictions to Estimate Queue Wait Times and Improve Scheduler Performance
IPPS/SPDP '99/JSSPP '99 Proceedings of the Job Scheduling Strategies for Parallel Processing
The Impact of Migration on Parallel Job Scheduling for Distributed Systems
Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
A Simulation - Based Performance Analysis of Gang Scheduling in a Distributed System
SS '99 Proceedings of the Thirty-Second Annual Simulation Symposium
Improving Parallel Job Scheduling by Combining Gang Scheduling and Backfilling Techniques
IPDPS '00 Proceedings of the 14th International Symposium on Parallel and Distributed Processing
Utilization and Predictability in Scheduling the IBM SP2 with Backfilling
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Access and Alignment of Data in an Array Processor
IEEE Transactions on Computers
Performance Implications of Periodic Checkpointing on Large-Scale Cluster Systems
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 18 - Volume 19
Efficient Subtorus Processor Allocation in a Multi-Dimensional Torus
HPCASIA '05 Proceedings of the Eighth International Conference on High-Performance Computing in Asia-Pacific Region
Multitoroidal Interconnects For Tightly Coupled Supercomputers
IEEE Transactions on Parallel and Distributed Systems
Blue Gene/L programming and operating environment
IBM Journal of Research and Development
Resource allocation and utilization in the Blue Gene/L supercomputer
IBM Journal of Research and Development
Evaluating cooperative checkpointing for supercomputing systems
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Scalable node allocation for improved performance in regular and anisotropic 3D torus supercomputers
EuroMPI'11 Proceedings of the 18th European MPI Users' Group conference on Recent advances in the message passing interface
JSSPP'04 Proceedings of the 10th international conference on Job Scheduling Strategies for Parallel Processing
Pitfalls in parallel job scheduling evaluation
JSSPP'05 Proceedings of the 11th international conference on Job Scheduling Strategies for Parallel Processing
Hi-index | 0.00 |
BlueGene/L is a massively parallel cellular architecture system with a toroidal interconnect. Cellular architectures with a toroidal interconnect are effective at producing highly scalable computing systems, but typically require job partitions to be both rectangular and contiguous. These restrictions introduce fragmentation issues that affect the utilization of the system and the wait time and slowdown of queued jobs.We propose to solve these problems for the BlueGene/L system through scheduling algorithms that augment a baseline first come first serve (FCFS) scheduler. Restricting ourselves to space-sharing techniques, which constitute a simpler solution to the requirements of cellular computing, we present simulation results for migration and backfilling techniques on BlueGene/L. These techniques are explored individually and jointly to determine their impact on the system. Our results demonstrate that migration can be effective for a pure FCFS scheduler but that backfilling produces even more benefits. We also show that migration can be combined with backfilling to produce more opportunities to better utilize a parallel machine.