Effective distributed scheduling of parallel workloads

Authors:
Andrea C. Dusseau;Remzi H. Arpaci;David E. Culler
Affiliations:
Computer Science Division, University of California, Berkeley;Computer Science Division, University of California, Berkeley;Computer Science Division, University of California, Berkeley
Venue:
Proceedings of the 1996 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Year:
1996

Citing 33
Cited 61

Characterizations of parallelism in applications and their use in scheduling

SIGMETRICS '89 Proceedings of the 1989 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Distributed Hierarchical Control for Parallel Processing

Computer
The performance of multiprogrammed multiprocessor scheduling algorithms

SIGMETRICS '90 Proceedings of the 1990 ACM SIGMETRICS conference on Measurement and modeling of computer systems
The impact of operating system scheduling policies and synchronization methods of performance of parallel applications

SIGMETRICS '91 Proceedings of the 1991 ACM SIGMETRICS conference on Measurement and modeling of computer systems
The Processor Working Set and its Use in Scheduling Multiprocessor Systems

IEEE Transactions on Software Engineering
SPLASH: Stanford parallel applications for shared-memory

ACM SIGARCH Computer Architecture News
Active messages: a mechanism for integrated communication and computation

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
A dynamic processor allocation policy for multiprogrammed shared-memory multiprocessors

ACM Transactions on Computer Systems (TOCS)
Using scheduler information to achieve optimal barrier synchronization performance

PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Cooperative shared memory: software and hardware for scalable multiprocessors

ACM Transactions on Computer Systems (TOCS)
Architectural requirements of parallel scientific applications with explicit communication

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Distributed computing feasibility in a non-dedicated homogeneous distributed system

Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Performance analysis of job scheduling policies in parallel supercomputing environments

Proceedings of the 1993 ACM/IEEE conference on Supercomputing
The magic garden explained: the internals of UNIX System V Release 4: an open systems design

The magic garden explained: the internals of UNIX System V Release 4: an open systems design
Analysis of the impact of memory in distributed parallel processing systems

SIGMETRICS '94 Proceedings of the 1994 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Processor allocation policies for message-passing parallel computers

SIGMETRICS '94 Proceedings of the 1994 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Use of application characteristics and limited preemption for run-to-completion parallel processor scheduling policies

SIGMETRICS '94 Proceedings of the 1994 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Scheduling and page migration for multiprocessor compute servers

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Coscheduling based on runtime identification of activity working sets

International Journal of Parallel Programming
High performance synchronization algorithms for multiprogrammed multiprocessors

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Scheduling memory constrained jobs on distributed memory parallel computers

Proceedings of the 1995 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
The interaction of parallel and sequential workloads on a network of workstations

Proceedings of the 1995 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Hive: fault containment for shared-memory multiprocessors

SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
U-Net: a user-level network interface for parallel and distributed computing

SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Empirical evaluation of the CRAY-T3D: a compiler perspective

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
The design and implementation of the 4.4BSD operating system

The design and implementation of the 4.4BSD operating system
Fast Parallel Sorting Under LogP: Experience with the CM-5

IEEE Transactions on Parallel and Distributed Systems
Empirical studies of competitve spinning for a shared-memory multiprocessor

SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
Assessing Fast Network Interfaces

IEEE Micro
Probabilistic Clock Synchronization in Distributed Systems

IEEE Transactions on Parallel and Distributed Systems
Fault-Tolerant Clock Synchronization in Large Multicomputer Systems

IEEE Transactions on Parallel and Distributed Systems
Demand-Based Coscheduling of Parallel Jobs on Multiprogrammed Multiprocessors

IPPS '95 Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing
lmbench: portable tools for performance analysis

ATEC '96 Proceedings of the 1996 annual conference on USENIX Annual Technical Conference

Thread scheduling for multiprogrammed multiprocessors

Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
Scheduling with implicit information in distributed systems

SIGMETRICS '98/PERFORMANCE '98 Proceedings of the 1998 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Mechanisms and policies for supporting fine-grained cycle stealing

ICS '99 Proceedings of the 13th international conference on Supercomputing
A simulation-based study of scheduling mechanisms for a dynamic cluster environment

Proceedings of the 14th international conference on Supercomputing
Exploiting Fine-Grained Idle Periods in Networks of Workstations

IEEE Transactions on Parallel and Distributed Systems
Scheduling best-effort and real-time pipelined applications on time-shared clusters

Proceedings of the thirteenth annual ACM symposium on Parallel algorithms and architectures
Dynamic adaptation to available resources for parallel computing in an autonomous network of workstations

PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
A Slowdown Model for Applications Executing on Time-Shared Clusters of Workstations

IEEE Transactions on Parallel and Distributed Systems
Implicit coscheduling: coordinated scheduling with implicit information in distributed systems

ACM Transactions on Computer Systems (TOCS)
Impact of Workload and System Parameters on Next Generation Cluster Scheduling Mechanisms

IEEE Transactions on Parallel and Distributed Systems
Linger Longer: fine-grain cycle stealing for networks of workstations

SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Performance characteristics of gang scheduling in multiprogrammed environments

SC '97 Proceedings of the 1997 ACM/IEEE conference on Supercomputing
Modeling and analysis of dynamic coscheduling in parallel and distributed environments

SIGMETRICS '02 Proceedings of the 2002 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Prediction and adaptation in Active Harmony

Cluster Computing
A Performance Comparison of Coscheduling Strategies for Workstation Clusters

Cluster Computing
Fair Scheduling of General-Purpose Workloads on Workstation Clusters

Cluster Computing
Online Prediction of the Running Time of Tasks

Cluster Computing
Virtual Network Transport Protocols for Myrinet

IEEE Micro
CMC: A Coscheduling Model for non-Dedicated Cluster Computing

IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Gang Scheduling with a Queue for Large Jobs

IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Process Tracking for Parallel Job Control

IPPS/SPDP '99/JSSPP '99 Proceedings of the Job Scheduling Strategies for Parallel Processing
Time-Sharing Parallel Jobs in the Presence of Multiple Resource Requirements

IPDPS '00/JSSPP '00 Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing
Coscheduling under Memory Constraints in a NOW Environment

JSSPP '01 Revised Papers from the 7th International Workshop on Job Scheduling Strategies for Parallel Processing
Implementing Explicit and Implicit Coscheduling in a PVM Environment (Research Note)

Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
Predictive Coscheduling Implementation in a Non-dedicated Linux Cluster

Euro-Par '01 Proceedings of the 7th International Euro-Par Conference Manchester on Parallel Processing
Instant-Access Cycle-Stealing for Parallel Applications Requiring Interactive Response

Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
Computational Grids

VECPAR '00 Selected Papers and Invited Talks from the 4th International Conference on Vector and Parallel Processing
Implementing and Analysing an Effective Explicit Coscheduling Algorithm on a NOW

VECPAR '00 Selected Papers and Invited Talks from the 4th International Conference on Vector and Parallel Processing
Adding Dynamic Coscheduling Support to PVM

Proceedings of the 8th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Scheduling on AP/Linux for Fine and Coarse Grain Parallel Processes

IPPS/SPDP '99/JSSPP '99 Proceedings of the Job Scheduling Strategies for Parallel Processing
Modeling the Slowdown of Data-Parallel Applications in Homogeneous and Heterogeneous Clusters of Workstations

HCW '98 Proceedings of the Seventh Heterogeneous Computing Workshop
Improving the Scalability of Parallel Jobs by adding Parallel Awareness to the Operating System

Proceedings of the 2003 ACM/IEEE conference on Supercomputing
PROC: Process ReOrdering-Based Coscheduling on Workstation Clusters

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
On Honey Bees and Dynamic Server Allocation in Internet Hosting Centers

Adaptive Behavior - Animals, Animats, Software Agents, Robots, Adaptive Systems
The case for cyber foraging

EW 10 Proceedings of the 10th workshop on ACM SIGOPS European workshop
LOMARC: Lookahead Matchmaking for Multiresource Coscheduling on Hyperthreaded CPUs

IEEE Transactions on Parallel and Distributed Systems
A runtime resolution scheme for priority boost conflict in implicit coscheduling

The Journal of Supercomputing
Vassal: loadable scheduler support for multi-policy scheduling

WINSYM'98 Proceedings of the 2nd conference on USENIX Windows NT Symposium - Volume 2
Coordinated thread scheduling for workstation clusters under windows NT

NT'97 Proceedings of the USENIX Windows NT Workshop on The USENIX Windows NT Workshop 1997
SLIC: an extensibility system for commodity operating systems

ATEC '98 Proceedings of the annual conference on USENIX Annual Technical Conference
Coscheduled distributed-Web servers on system area network

Journal of Parallel and Distributed Computing
A scalable multithreaded L7-filter design for multi-core servers

Proceedings of the 4th ACM/IEEE Symposium on Architectures for Networking and Communications Systems
Dynamic load balancing for I/O-intensive applications on clusters

ACM Transactions on Storage (TOS)
Quincy: fair scheduling for distributed computing clusters

Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
Conductor: orchestrating the clouds

Proceedings of the 4th International Workshop on Large Scale Distributed Systems and Middleware
An adaptive hash-based multilayer scheduler for L7-filter on a highly threaded hierarchical multi-core server

Proceedings of the 5th ACM/IEEE Symposium on Architectures for Networking and Communications Systems
A knowledge-based apporach to scheduling jobs in metacomputer environment

EURO-PDP'00 Proceedings of the 8th Euromicro conference on Parallel and distributed processing
Linux kernel co-scheduling for bulk synchronous parallel applications

Proceedings of the 1st International Workshop on Runtime and Operating Systems for Supercomputers
LOMARC — lookahead matchmaking for multi-resource coscheduling

JSSPP'04 Proceedings of the 10th international conference on Job Scheduling Strategies for Parallel Processing
Virtual InfiniBand clusters for HPC clouds

Proceedings of the 2nd International Workshop on Cloud Computing Platforms
Orchestrating the deployment of computations in the cloud with conductor

NSDI'12 Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation
Linux kernel co-scheduling and bulk synchronous parallelism

International Journal of High Performance Computing Applications
Scheduling Concurrent Applications on a Cluster of CPU-GPU Nodes

CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Demand-based coordinated scheduling for SMP VMs

Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
A preemption-based runtime to efficiently schedule multi-process applications on heterogeneous clusters with GPUs

Proceedings of the 22nd international symposium on High-performance parallel and distributed computing
High performance cloud computing

Future Generation Computer Systems
Solving the straggler problem with bounded staleness

HotOS'13 Proceedings of the 14th USENIX conference on Hot Topics in Operating Systems
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles

ACM SIGOPS 24th Symposium on Operating Systems Principles
Parrot: a practical runtime for deterministic, stable, and reliable threads

Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
Scheduling concurrent applications on a cluster of CPU-GPU nodes

Future Generation Computer Systems
Towards fair and efficient SMP virtual machine scheduling

Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a distributed algorithm for time-sharing parallel workloads that is competitive with coscheduling. Implicit scheduling allows each local scheduler in the system to make independent decisions that dynamically coordinate the scheduling of cooperating processes across processors. Of particular importance is the blocking algorithm which decides the action of a process waiting for a communication or synchronization event to complete. Through simulation of bulk-synchronous parallel applications, we find that a simple two-phase fixed-spin blocking algorithm performs well; a two-phase adaptive algorithm that gathers run-time data on barrier wait-times performs slightly better. Our results hold for a range of machine parameters and parallel program characteristics. These findings are in direct contrast to the literature that states explicit coscheduling is necessary for fine-grained programs. We show that the choice of the local scheduler is crucial, with a priority-based scheduler performing two to three times better than a round-robin scheduler. Overall, we find that the performance of implicit scheduling is near that of coscheduling (+/- 35%), without the requirement of explicit, global coordination.