The impact of operating system scheduling policies and synchronization methods of performance of parallel applications

Authors:
Anoop Gupta;Andrew Tucker;Shigeru Urushibara
Affiliations:
-;-;-
Venue:
SIGMETRICS '91 Proceedings of the 1991 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Year:
1991

Citing 15
Cited 61

Cache coherence protocols: evaluation using a multiprocessor simulation model

ACM Transactions on Computer Systems (TOCS)
A new approach to the maximum flow problem

STOC '86 Proceedings of the eighteenth annual ACM symposium on Theory of computing
Footprints in the cache

ACM Transactions on Computer Systems (TOCS)
Portable programs for parallel processors

Portable programs for parallel processors
Firefly: A Multiprocessor Workstation

IEEE Transactions on Computers - Special issue on architectural support for programming languages and operating systems
Scheduling in multiprogrammed parallel systems

SIGMETRICS '88 Proceedings of the 1988 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Process control and scheduling issues for multiprogrammed shared-memory multiprocessors

SOSP '89 Proceedings of the twelfth ACM symposium on Operating systems principles
Characterizations of parallelism in applications and their use in scheduling

SIGMETRICS '89 Proceedings of the 1989 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Scheduling Support for Concurrency and Parallelism in the Mach Operating System

Computer
Distributed Hierarchical Control for Parallel Processing

Computer
Processor scheduling in shared memory multiprocessors

SIGMETRICS '90 Proceedings of the 1990 ACM SIGMETRICS conference on Measurement and modeling of computer systems
The performance of multiprogrammed multiprocessor scheduling algorithms

SIGMETRICS '90 Proceedings of the 1990 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Techniques for improving the performance of sparse matrix factorization on multiprocessor workstations

Proceedings of the 1990 ACM/IEEE conference on Supercomputing
The structure of the “THE”-multiprogramming system

Communications of the ACM
Tango: A Multiprocessor Simulation and Tracing System

Tango: A Multiprocessor Simulation and Tracing System

User-level interprocess communication for shared memory multiprocessors

ACM Transactions on Computer Systems (TOCS)
The implications of cache affinity on processor scheduling for multiprogrammed, shared memory multiprocessors

SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
Characterizing the caching and synchronization performance of a multiprocessor operating system

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
A dynamic processor allocation policy for multiprogrammed shared-memory multiprocessors

ACM Transactions on Computer Systems (TOCS)
Using scheduler information to achieve optimal barrier synchronization performance

PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
A machine independent interface for lightweight threads

ACM SIGOPS Operating Systems Review
Processor scheduling on multiprogrammed, distributed memory parallel computers

SIGMETRICS '93 Proceedings of the 1993 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Benefits of cache-affinity scheduling in shared-memory multiprocessors: a summary

SIGMETRICS '93 Proceedings of the 1993 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Processor allocation policies for message-passing parallel computers

SIGMETRICS '94 Proceedings of the 1994 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Scheduling and page migration for multiprocessor compute servers

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
A Measurement-Based Model to Predict the Performance Impact of System Modifications: A Case Study

IEEE Transactions on Parallel and Distributed Systems
The interaction of parallel and sequential workloads on a network of workstations

Proceedings of the 1995 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Effective distributed scheduling of parallel workloads

Proceedings of the 1996 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Coordinated allocation of memory and processors in multiprocessors

Proceedings of the 1996 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
The effectiveness of affinity-based scheduling in multiprocessor network protocol processing (extended version)

IEEE/ACM Transactions on Networking (TON)
Dynamic resource management on distributed systems using reconfigurable applications

IBM Journal of Research and Development - Special issue: performance analysis and its impact on design
Processor Saving Scheduling Policies for Multiprocessor Systems

IEEE Transactions on Computers
Thread scheduling for multiprogrammed multiprocessors

Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
Kernel-level scheduling for the nano-threads programming model

ICS '98 Proceedings of the 12th international conference on Supercomputing
Scheduling with implicit information in distributed systems

SIGMETRICS '98/PERFORMANCE '98 Proceedings of the 1998 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
The performance of work stealing in multiprogrammed environments (extended abstract)

SIGMETRICS '98/PERFORMANCE '98 Proceedings of the 1998 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
A quantitative architectural evaluation of synchronization algorithms and disciplines on ccNUMA systems: the case of the SGI Origin2000

ICS '99 Proceedings of the 13th international conference on Supercomputing
Memory Conscious Scheduling for Cluster-based NUMA Multiprocessors

The Journal of Supercomputing
Highly efficient gang scheduling implementation

SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Symbiotic jobscheduling with priorities for a simultaneous multithreading processor

SIGMETRICS '02 Proceedings of the 2002 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Fair Scheduling of General-Purpose Workloads on Workstation Clusters

Cluster Computing
The Architectural and Operating System Implications on the Performance of Synchronization on ccNUMA Multiprocessors

International Journal of Parallel Programming
Using Processor Affinity in Loop Scheduling on Shared-Memory Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
An Effective Processor Allocation Strategy for Multiprogrammed Shared-Memory Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
Efficient Execution of Parallel Applications in Multiprogrammed Multiprocessor Systems

IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Analysis of Several Scheduling Algorithms under the Nano-Thread Programming Model

IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Benchmarks and Standards for the Evaluation of Parallel Job Schedulers

IPPS/SPDP '99/JSSPP '99 Proceedings of the Job Scheduling Strategies for Parallel Processing
A Tool to Schedule Parallel Applications on Multiprocessors: The NANOS CPU MANAGER

IPDPS '00/JSSPP '00 Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing
Time-Sharing Parallel Jobs in the Presence of Multiple Resource Requirements

IPDPS '00/JSSPP '00 Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing
A Transparent Operating System Infrastructure for Embedding Adaptability to Thread-Based Programming Models

Euro-Par '01 Proceedings of the 7th International Euro-Par Conference Manchester on Parallel Processing
Adding Dynamic Coscheduling Support to PVM

Proceedings of the 8th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Parallel Job Scheduling: A Performance Perspective

Performance Evaluation: Origins and Directions
A Comprehensive Dynamic Processor Allocation Scheme for Multiprogrammed Multiprocessor Systems

ICPP '00 Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing
Improving the Scalability of Parallel Jobs by adding Parallel Awareness to the Operating System

Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Another approach to backfilled jobs: applying virtual malleability to expired windows

Proceedings of the 19th annual international conference on Supercomputing
Experience distributing objects in an SMMP OS

ACM Transactions on Computer Systems (TOCS)
On the importance of parallel application placement in NUMA multiprocessors

Sedms'93 USENIX Systems on USENIX Experiences with Distributed and Multiprocessor Systems - Volume 4
Measuring and Evaluating Parallel State-Space Exploration Algorithms

Electronic Notes in Theoretical Computer Science (ENTCS)
Load balancing on speed

Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
On topology reconfiguration for defect-tolerant NoC-based homogeneous manycore systems

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Decoupling contention management from scheduling

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Dynamic load balancing in MPI jobs

ISHPC'05/ALPS'06 Proceedings of the 6th international symposium on high-performance computing and 1st international conference on Advanced low power systems
Managing distributed resources in the SVG project

EUROMICRO-PDP'02 Proceedings of the 10th Euromicro conference on Parallel, distributed and network-based processing
The effectiveness of affinity-based scheduling in multiprocessor networking

INFOCOM'96 Proceedings of the Fifteenth annual joint conference of the IEEE computer and communications societies conference on The conference on computer communications - Volume 1
Linux kernel co-scheduling for bulk synchronous parallel applications

Proceedings of the 1st International Workshop on Runtime and Operating Systems for Supercomputers
Juggle: proactive load balancing on multicore computers

Proceedings of the 20th international symposium on High performance distributed computing
Sticky-ERfair: a task-processor affinity aware proportional fair scheduler

Real-Time Systems
Thread Tranquilizer: Dynamically reducing performance variation

ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Data sharing conscious scheduling for multi-threaded applications on SMP machines

Euro-Par'06 Proceedings of the 12th international conference on Parallel Processing
Revisiting the combining synchronization technique

Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Pitfalls in parallel job scheduling evaluation

JSSPP'05 Proceedings of the 11th international conference on Job Scheduling Strategies for Parallel Processing
Linux kernel co-scheduling and bulk synchronous parallelism

International Journal of High Performance Computing Applications
A simulation framework for scheduling performance evaluation on CPU-GPU heterogeneous system

ICCSA'12 Proceedings of the 12th international conference on Computational Science and Its Applications - Volume Part IV
ADAPT: A framework for coscheduling multithreaded programs

ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Uncovering CPU load balancing policies with harmony

Proceedings of the ACM International Conference on Computing Frontiers
Juggle: addressing extrinsic load imbalances in SPMD applications on multicore computers

Cluster Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Shared-memory multiprocessors are frequently used as compute servers with multiple parallel applications executing at the same time. In such environments, the efficiency of a parallel application can be significantly affected by the operating system scheduling policy. In this paper, we use detailed simulation studies to evaluate the performance of several different scheduling strategies, These include regular priority scheduling, coscheduling or gang scheduling, process control with processor partitioning, handoff scheduling, and affinity-based scheduling. We also explore tradeoffs between the use of busy-waiting and blocking synchronization primitives and their interactions with the scheduling strategies. Since effective use of caches is essential to achieving high performance, a key focus is on the impact of the scheduling strategies on the caching behavior of the applications.Our results show that in situations where the number of processes exceeds the number of processors, regular priority-based scheduling in conjunction with busy-waiting synchronization primitives results in extremely poor processor utilization. In such situations, use of blocking synchronization primitives can significantly improve performance. Process control and gang scheduling strategies are shown to offer the highest performance, and their performance is relatively independent of the synchronization method used. However, for applications that have sizable working sets that fit into the cache, process control performs better than gang scheduling. For the applications considered, the performance gains due to handoff scheduling and processor affinity are shown to be small.