Cache coherence protocols: evaluation using a multiprocessor simulation model
ACM Transactions on Computer Systems (TOCS)
A new approach to the maximum flow problem
STOC '86 Proceedings of the eighteenth annual ACM symposium on Theory of computing
ACM Transactions on Computer Systems (TOCS)
Portable programs for parallel processors
Portable programs for parallel processors
Firefly: A Multiprocessor Workstation
IEEE Transactions on Computers - Special issue on architectural support for programming languages and operating systems
Scheduling in multiprogrammed parallel systems
SIGMETRICS '88 Proceedings of the 1988 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Process control and scheduling issues for multiprogrammed shared-memory multiprocessors
SOSP '89 Proceedings of the twelfth ACM symposium on Operating systems principles
Characterizations of parallelism in applications and their use in scheduling
SIGMETRICS '89 Proceedings of the 1989 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Processor scheduling in shared memory multiprocessors
SIGMETRICS '90 Proceedings of the 1990 ACM SIGMETRICS conference on Measurement and modeling of computer systems
The performance of multiprogrammed multiprocessor scheduling algorithms
SIGMETRICS '90 Proceedings of the 1990 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Proceedings of the 1990 ACM/IEEE conference on Supercomputing
The structure of the “THE”-multiprogramming system
Communications of the ACM
Tango: A Multiprocessor Simulation and Tracing System
Tango: A Multiprocessor Simulation and Tracing System
User-level interprocess communication for shared memory multiprocessors
ACM Transactions on Computer Systems (TOCS)
SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
Characterizing the caching and synchronization performance of a multiprocessor operating system
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
A dynamic processor allocation policy for multiprogrammed shared-memory multiprocessors
ACM Transactions on Computer Systems (TOCS)
Using scheduler information to achieve optimal barrier synchronization performance
PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
A machine independent interface for lightweight threads
ACM SIGOPS Operating Systems Review
Processor scheduling on multiprogrammed, distributed memory parallel computers
SIGMETRICS '93 Proceedings of the 1993 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Benefits of cache-affinity scheduling in shared-memory multiprocessors: a summary
SIGMETRICS '93 Proceedings of the 1993 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Processor allocation policies for message-passing parallel computers
SIGMETRICS '94 Proceedings of the 1994 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Scheduling and page migration for multiprocessor compute servers
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
A Measurement-Based Model to Predict the Performance Impact of System Modifications: A Case Study
IEEE Transactions on Parallel and Distributed Systems
The interaction of parallel and sequential workloads on a network of workstations
Proceedings of the 1995 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Effective distributed scheduling of parallel workloads
Proceedings of the 1996 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Coordinated allocation of memory and processors in multiprocessors
Proceedings of the 1996 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
IEEE/ACM Transactions on Networking (TON)
Dynamic resource management on distributed systems using reconfigurable applications
IBM Journal of Research and Development - Special issue: performance analysis and its impact on design
Processor Saving Scheduling Policies for Multiprocessor Systems
IEEE Transactions on Computers
Thread scheduling for multiprogrammed multiprocessors
Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
Kernel-level scheduling for the nano-threads programming model
ICS '98 Proceedings of the 12th international conference on Supercomputing
Scheduling with implicit information in distributed systems
SIGMETRICS '98/PERFORMANCE '98 Proceedings of the 1998 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
The performance of work stealing in multiprogrammed environments (extended abstract)
SIGMETRICS '98/PERFORMANCE '98 Proceedings of the 1998 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
ICS '99 Proceedings of the 13th international conference on Supercomputing
Memory Conscious Scheduling for Cluster-based NUMA Multiprocessors
The Journal of Supercomputing
Highly efficient gang scheduling implementation
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Symbiotic jobscheduling with priorities for a simultaneous multithreading processor
SIGMETRICS '02 Proceedings of the 2002 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Fair Scheduling of General-Purpose Workloads on Workstation Clusters
Cluster Computing
International Journal of Parallel Programming
Using Processor Affinity in Loop Scheduling on Shared-Memory Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
An Effective Processor Allocation Strategy for Multiprogrammed Shared-Memory Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
Efficient Execution of Parallel Applications in Multiprogrammed Multiprocessor Systems
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Analysis of Several Scheduling Algorithms under the Nano-Thread Programming Model
IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Benchmarks and Standards for the Evaluation of Parallel Job Schedulers
IPPS/SPDP '99/JSSPP '99 Proceedings of the Job Scheduling Strategies for Parallel Processing
A Tool to Schedule Parallel Applications on Multiprocessors: The NANOS CPU MANAGER
IPDPS '00/JSSPP '00 Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing
Time-Sharing Parallel Jobs in the Presence of Multiple Resource Requirements
IPDPS '00/JSSPP '00 Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing
Euro-Par '01 Proceedings of the 7th International Euro-Par Conference Manchester on Parallel Processing
Adding Dynamic Coscheduling Support to PVM
Proceedings of the 8th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Parallel Job Scheduling: A Performance Perspective
Performance Evaluation: Origins and Directions
A Comprehensive Dynamic Processor Allocation Scheme for Multiprogrammed Multiprocessor Systems
ICPP '00 Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing
Improving the Scalability of Parallel Jobs by adding Parallel Awareness to the Operating System
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Another approach to backfilled jobs: applying virtual malleability to expired windows
Proceedings of the 19th annual international conference on Supercomputing
Experience distributing objects in an SMMP OS
ACM Transactions on Computer Systems (TOCS)
On the importance of parallel application placement in NUMA multiprocessors
Sedms'93 USENIX Systems on USENIX Experiences with Distributed and Multiprocessor Systems - Volume 4
Measuring and Evaluating Parallel State-Space Exploration Algorithms
Electronic Notes in Theoretical Computer Science (ENTCS)
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
On topology reconfiguration for defect-tolerant NoC-based homogeneous manycore systems
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Decoupling contention management from scheduling
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Dynamic load balancing in MPI jobs
ISHPC'05/ALPS'06 Proceedings of the 6th international symposium on high-performance computing and 1st international conference on Advanced low power systems
Managing distributed resources in the SVG project
EUROMICRO-PDP'02 Proceedings of the 10th Euromicro conference on Parallel, distributed and network-based processing
The effectiveness of affinity-based scheduling in multiprocessor networking
INFOCOM'96 Proceedings of the Fifteenth annual joint conference of the IEEE computer and communications societies conference on The conference on computer communications - Volume 1
Linux kernel co-scheduling for bulk synchronous parallel applications
Proceedings of the 1st International Workshop on Runtime and Operating Systems for Supercomputers
Juggle: proactive load balancing on multicore computers
Proceedings of the 20th international symposium on High performance distributed computing
Thread Tranquilizer: Dynamically reducing performance variation
ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Data sharing conscious scheduling for multi-threaded applications on SMP machines
Euro-Par'06 Proceedings of the 12th international conference on Parallel Processing
Revisiting the combining synchronization technique
Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Pitfalls in parallel job scheduling evaluation
JSSPP'05 Proceedings of the 11th international conference on Job Scheduling Strategies for Parallel Processing
Linux kernel co-scheduling and bulk synchronous parallelism
International Journal of High Performance Computing Applications
A simulation framework for scheduling performance evaluation on CPU-GPU heterogeneous system
ICCSA'12 Proceedings of the 12th international conference on Computational Science and Its Applications - Volume Part IV
ADAPT: A framework for coscheduling multithreaded programs
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Uncovering CPU load balancing policies with harmony
Proceedings of the ACM International Conference on Computing Frontiers
Hi-index | 0.00 |
Shared-memory multiprocessors are frequently used as compute servers with multiple parallel applications executing at the same time. In such environments, the efficiency of a parallel application can be significantly affected by the operating system scheduling policy. In this paper, we use detailed simulation studies to evaluate the performance of several different scheduling strategies, These include regular priority scheduling, coscheduling or gang scheduling, process control with processor partitioning, handoff scheduling, and affinity-based scheduling. We also explore tradeoffs between the use of busy-waiting and blocking synchronization primitives and their interactions with the scheduling strategies. Since effective use of caches is essential to achieving high performance, a key focus is on the impact of the scheduling strategies on the caching behavior of the applications.Our results show that in situations where the number of processes exceeds the number of processors, regular priority-based scheduling in conjunction with busy-waiting synchronization primitives results in extremely poor processor utilization. In such situations, use of blocking synchronization primitives can significantly improve performance. Process control and gang scheduling strategies are shown to offer the highest performance, and their performance is relatively independent of the synchronization method used. However, for applications that have sizable working sets that fit into the cache, process control performs better than gang scheduling. For the applications considered, the performance gains due to handoff scheduling and processor affinity are shown to be small.