When the Herd Is Smart: Aggregate Behavior in the Selection of Job Request

Authors:
Walfredo Cirne;Francine Berman
Affiliations:
-;-
Venue:
IEEE Transactions on Parallel and Distributed Systems
Year:
2003

Citing 27
Cited 16

Speedup Versus Efficiency in Parallel Systems

IEEE Transactions on Computers
Characterizations of parallelism in applications and their use in scheduling

SIGMETRICS '89 Proceedings of the 1989 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
The Processor Working Set and its Use in Scheduling Multiprocessor Systems

IEEE Transactions on Software Engineering
Application scheduling and processor allocation in multiprogrammed parallel processing systems

Performance Evaluation - Special issue: performance modeling of parallel processing systems
Robust partitioning policies of multiprocessor systems

Performance Evaluation - Special issue: performance modeling of parallel processing systems
Processor allocation policies for message-passing parallel computers

SIGMETRICS '94 Proceedings of the 1994 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Use of application characteristics and limited preemption for run-to-completion parallel processor scheduling policies

SIGMETRICS '94 Proceedings of the 1994 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Benefits of speedup knowledge in memory-constrained multiprocessor scheduling

Performance Evaluation
Parallel application scheduling on networks of workstations

Journal of Parallel and Distributed Computing
Computer architecture (2nd ed.): a quantitative approach

Computer architecture (2nd ed.): a quantitative approach
The elusive goal of workload characterization

ACM SIGMETRICS Performance Evaluation Review
How Useful Is Old Information?

IEEE Transactions on Parallel and Distributed Systems
Application-level scheduling on distributed heterogeneous networks

Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Impact of job mix on optimizations for space sharing schedulers

Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Using moldability to improve the performance of supercomputer jobs

Journal of Parallel and Distributed Computing
A Model for Moldable Supercomputer Jobs

IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Analysis of Non-Work-Conserving Processor Partitioning Policies

IPPS '95 Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing
The ANL/IBM SP Scheduling System

IPPS '95 Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing
Job Scheduling Under the Portable Batch System

IPPS '95 Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing
Theory and Practice in Parallel Job Scheduling

IPPS '97 Proceedings of the Job Scheduling Strategies for Parallel Processing
Using Queue Time Predictions for Processor Allocation

IPPS '97 Proceedings of the Job Scheduling Strategies for Parallel Processing
A Comparative Study of Real Workload Traces and Synthetic Workload Models for Parallel Job Scheduling

IPPS/SPDP '98 Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing
Metrics and Benchmarking for Parallel Job Scheduling

IPPS/SPDP '98 Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing
Job Scheduling Scheme for Pure Space Sharing Among Rigid Jobs

IPPS/SPDP '98 Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing
Utilization and Predictability in Scheduling the IBM SP2 with Backfilling

IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
A Model For Speedup of Parallel Programs

A Model For Speedup of Parallel Programs
A comprehensive model of the supercomputer workload

WWC '01 Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop

Using moldability to improve the performance of supercomputer jobs

Journal of Parallel and Distributed Computing
Adaptive Computing on the Grid Using AppLeS

IEEE Transactions on Parallel and Distributed Systems
ATOP-space and time adaptation for parallel and grid applications via flexible data partitioning

ARM '04 Proceedings of the 3rd workshop on Adaptive and reflective middleware
The impact of data replication on job scheduling performance in the Data Grid

Future Generation Computer Systems
Backfilling Using System-Generated Predictions Rather than User Runtime Estimates

IEEE Transactions on Parallel and Distributed Systems
Allocation strategies for utilization of space-shared resources in Bag of Tasks grids

Future Generation Computer Systems
Selective preemption strategies for parallel job scheduling

International Journal of High Performance Computing and Networking
Adaptive time/space sharing with SCOJO

International Journal of High Performance Computing and Networking
Robust scheduling of moldable parallel jobs

International Journal of High Performance Computing and Networking
The impact of data replication on job scheduling performance in the Data Grid

Future Generation Computer Systems
Adaptive job scheduling via predictive job resource allocation

JSSPP'06 Proceedings of the 12th international conference on Job scheduling strategies for parallel processing
Distributed Radiotherapy Simulation with the Webcom Workflow System

International Journal of High Performance Computing Applications
A novel multi-agent reinforcement learning approach for job scheduling in Grid computing

Future Generation Computer Systems
Improving job scheduling performance with parallel access to replicas in Data Grid environment

The Journal of Supercomputing
LOMARC — lookahead matchmaking for multi-resource coscheduling

JSSPP'04 Proceedings of the 10th international conference on Job Scheduling Strategies for Parallel Processing
Combining data replication algorithms and job scheduling heuristics in the data grid

Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

In most parallel supercomputers, submitting a job for execution involves specifying how many processors are to be allocated to the job. When the job is moldable (i.e., there is a choice on how many processors the job uses), an application scheduler called SA can significantly improve job performance by automatically selecting how many processors to use. Since most jobs are moldable, this result has great impact to the current state of practice in supercomputer scheduling. However, the widespread use of SA can change the nature of workload processed by supercomputers. When many SAs are scheduling jobs on one supercomputer, the decision made by one SA affects the state of the system, therefore impacting other instances of SA. In this case, the global behavior of the system comes from the aggregate behavior caused by all SAs. In particular, it is reasonable to expect the competition for resources to become tougher with multiple SAs, and this tough competition to decrease the performance improvement attained by each SA individually. This paper investigates this very issue. We found that the increased competition indeed makes it harder for each individual instance of SA to improve job performance. Nevertheless, there are two other aggregate behaviors that override increased competition when the system load is moderate to heavy. First, as load goes up, SA chooses smaller requests, which increases efficiency, which effectively decreases the offered load, which mitigates long wait times. Second, better job packing and fewer jobs in the system make it easier for incoming jobs to fit in the supercomputer schedule, thus reducing wait times further. As a result, in moderate to heavy load conditions, a single instance of SA benefits from the fact that other jobs are also using SA.