Using Processor Affinity in Loop Scheduling on Shared-Memory Multiprocessors

Authors:
E. P. Markatos;T. J. LeBlanc
Affiliations:
-;-
Venue:
IEEE Transactions on Parallel and Distributed Systems
Year:
1994

Citing 22
Cited 44

Allocating Independent Subtasks on Parallel Processors

IEEE Transactions on Software Engineering
Assignment problems in parallel and distributed computing

Assignment problems in parallel and distributed computing
Guided self-scheduling: A practical scheduling scheme for parallel supercomputers

IEEE Transactions on Computers
An open enviornment for building parallel programming systems

PPEALS '88 Proceedings of the ACM/SIGPLAN conference on Parallel programming: experience with applications, languages and systems
Simple but effective techniques for NUMA memory management

SOSP '89 Proceedings of the twelfth ACM symposium on Operating systems principles
The implementation of a coherent memory abstraction on a NUMA multiprocessor: experiences with platinum

SOSP '89 Proceedings of the twelfth ACM symposium on Operating systems principles
Process control and scheduling issues for multiprogrammed shared-memory multiprocessors

SOSP '89 Proceedings of the twelfth ACM symposium on Operating systems principles
The Performance Implications of Thread Management Alternatives for Shared-Memory Multiprocessors

IEEE Transactions on Computers
Computer architecture: a quantitative approach

Computer architecture: a quantitative approach
NUMA policies and their relation to memory architecture

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
The impact of operating system scheduling policies and synchronization methods of performance of parallel applications

SIGMETRICS '91 Proceedings of the 1991 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Analysis of task migration in shared-memory multiprocessor scheduling

SIGMETRICS '91 Proceedings of the 1991 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Experimental comparison of memory management policies for NUMA multiprocessors

ACM Transactions on Computer Systems (TOCS)
The implications of cache affinity on processor scheduling for multiprogrammed, shared memory multiprocessors

SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
Introduction to parallel computing

Introduction to parallel computing
Factoring: a method for scheduling parallel loops

Communications of the ACM
A dynamic scheduling method for irregular parallel programs

PLDI '92 Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation
A dynamic processor allocation policy for multiprogrammed shared-memory multiprocessors

ACM Transactions on Computer Systems (TOCS)
Parallel Programming and Compilers

Parallel Programming and Compilers
Trapezoid Self-Scheduling: A Practical Scheduling Scheme for Parallel Compilers

IEEE Transactions on Parallel and Distributed Systems
Using Processor-Cache Affinity Information in Shared-Memory Multiprocessor Scheduling

IEEE Transactions on Parallel and Distributed Systems
Shared-Memory Multiprocessor Trends and the Implications for Parallel Program Performance

Shared-Memory Multiprocessor Trends and the Implications for Parallel Program Performance

Combining static and dynamic scheduling on distributed-memory multiprocessors

ICS '94 Proceedings of the 8th international conference on Supercomputing
Compiler optimizations for eliminating barrier synchronization

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Future applicability of bus-based shared memory multiprocessors

Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures
Balancing processor loads and exploiting data locality in N-body simulations

Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Unified compilation techniques for shared and distributed address space machines

ICS '95 Proceedings of the 9th international conference on Supercomputing
The effectiveness of affinity-based scheduling in multiprocessor network protocol processing (extended version)

IEEE/ACM Transactions on Networking (TON)
A template for non-uniform parallel loops based on dynamic scheduling and prefetching techniques

ICS '96 Proceedings of the 10th international conference on Supercomputing
Adaptively Scheduling Parallel Loops in Distributed Shared-Memory Systems

IEEE Transactions on Parallel and Distributed Systems
Scheduling policies to support distributed 3D multimedia applications

SIGMETRICS '98/PERFORMANCE '98 Proceedings of the 1998 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Coarse grained parallel computing on heterogeneous systems

SAC '98 Proceedings of the 1998 ACM symposium on Applied Computing
Cacheminer: A Runtime Approach to Exploit Cache Locality on SMP

IEEE Transactions on Parallel and Distributed Systems
Dynamic Task Scheduling Using Online Optimization

IEEE Transactions on Parallel and Distributed Systems
Exploiting Wavefront Parallelism on Large-Scale Shared-Memory Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
The trade-off between implicit and explicit data distribution in shared-memory programming paradigms

ICS '01 Proceedings of the 15th international conference on Supercomputing
Analytical and experimental evaluation of cluster-based network servers

World Wide Web
Affinity scheduling of unbalanced workloads

Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Runtime vs. Manual Data Distribution for Architecture-Agnostic Shared-Memory Programming Models

International Journal of Parallel Programming
Enhancing Software DSM for Compiler-Parallelized Applications

IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
A Compile-Time Partitioning Strategy for Non-Rectangular Loop Nests

IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Feedback Guided Scheduling of Nested Loops

PARA '00 Proceedings of the 5th International Workshop on Applied Parallel Computing, New Paradigms for HPC in Industry and Academia
Load Balancing for Minimizing Execution Time of a Target Job on a Network of Heterogeneous Workstations

IPDPS '00/JSSPP '00 Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing
Compiler and Run-Time Support for Adaptive Load Balancing in Software Distributed Shared Memory Systems

LCR '98 Selected Papers from the 4th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
Scheduling at Twilight the Easy Way

STACS '02 Proceedings of the 19th Annual Symposium on Theoretical Aspects of Computer Science
Customized dynamic load balancing for a network of workstations

HPDC '96 Proceedings of the 5th IEEE International Symposium on High Performance Distributed Computing
LODS: locality-oriented dynamic scheduling for on-chip multiprocessors

Proceedings of the 41st annual Design Automation Conference
Shared memory multiprocessor support for functional array processing in SAC

Journal of Functional Programming
Design and implementation of a novel dynamic load balancing library for cluster computing

Parallel Computing - Heterogeneous computing
Feedback guided dynamic loop scheduling: convergence of the continuous case

The Journal of Supercomputing - Special issue: Parallel and distributed processing and applications
Efficient pattern mining on shared memory systems: implications for chip multiprocessor architectures

Proceedings of the 2006 workshop on Memory system performance and correctness
CRAUL: Compiler and run-time integration for adaptation under load[1]This work was supported in part by NSF grants CDA-9401142, CCR-9702466, and CCR-9705594; and an external research grant from Compaq.

Scientific Programming
SAC: off-the-shelf support for data-parallelism on multicores

Proceedings of the 2007 workshop on Declarative aspects of multicore programming
Memory bank aware dynamic loop scheduling

Proceedings of the conference on Design, automation and test in Europe
Feedback-directed thread scheduling with memory considerations

Proceedings of the 16th international symposium on High performance distributed computing
Experience distributing objects in an SMMP OS

ACM Transactions on Computer Systems (TOCS)
Enhancing self-scheduling algorithms via synchronization and weighting

Journal of Parallel and Distributed Computing
Scalable loop self-scheduling schemes for heterogeneous clusters

International Journal of Computational Science and Engineering
Performance evaluation of a dynamic load-balancing library for cluster computing

International Journal of Computational Science and Engineering
Derivation of self-scheduling algorithms for heterogeneous distributed computer systems: Application to internet-based grids of computers

Future Generation Computer Systems
Studying the impact of synchronization frequency on scheduling tasks with dependencies in heterogeneous systems

Performance Evaluation
The effectiveness of affinity-based scheduling in multiprocessor networking

INFOCOM'96 Proceedings of the Fifteenth annual joint conference of the IEEE computer and communications societies conference on The conference on computer communications - Volume 1
Dynamic multi phase scheduling for heterogeneous cluste

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
An efficient approach for self-scheduling parallel loops on multiprogrammed parallel computers

LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
A new carried-dependence self-scheduling algorithm

ICCSA'05 Proceedings of the 2005 international conference on Computational Science and its Applications - Volume Part I
Convergence of the discrete FGDLS algorithm

HPCC'05 Proceedings of the First international conference on High Performance Computing and Communications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Loops are the single largest source of parallelism in many applications. One way to exploit this parallelism is to execute loop iterations in parallel on different processors. Previous approaches to loop scheduling attempted to achieve the minimum completion time by distributing the workload as evenly as possible while minimizing the number of synchronization operations required. The authors consider a third dimension to the problem of loop scheduling on shared-memory multiprocessors: communication overhead caused by accesses to nonlocal data. They show that traditional algorithms for loop scheduling, which ignore the location of data when assigning iterations to processors, incur a significant performance penalty on modern shared-memory multiprocessors. They propose a new loop scheduling algorithm that attempts to simultaneously balance the workload, minimize synchronization, and co-locate loop iterations with the necessary data. They compare the performance of this new algorithm to other known algorithms by using five representative kernel programs on a Silicon Graphics multiprocessor workstation, a BBN Butterfly, a Sequent Symmetry, and a KSR-1, and show that the new algorithm offers substantial performance improvements, up to a factor of 4 in some cases. The authors conclude that loop scheduling algorithms for shared-memory multiprocessors cannot afford to ignore the location of data, particularly in light of the increasing disparity between processor and memory speeds.