Allocating Independent Subtasks on Parallel Processors
IEEE Transactions on Software Engineering
Assignment problems in parallel and distributed computing
Assignment problems in parallel and distributed computing
Guided self-scheduling: A practical scheduling scheme for parallel supercomputers
IEEE Transactions on Computers
An open enviornment for building parallel programming systems
PPEALS '88 Proceedings of the ACM/SIGPLAN conference on Parallel programming: experience with applications, languages and systems
Simple but effective techniques for NUMA memory management
SOSP '89 Proceedings of the twelfth ACM symposium on Operating systems principles
SOSP '89 Proceedings of the twelfth ACM symposium on Operating systems principles
Process control and scheduling issues for multiprogrammed shared-memory multiprocessors
SOSP '89 Proceedings of the twelfth ACM symposium on Operating systems principles
The Performance Implications of Thread Management Alternatives for Shared-Memory Multiprocessors
IEEE Transactions on Computers
Computer architecture: a quantitative approach
Computer architecture: a quantitative approach
NUMA policies and their relation to memory architecture
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
SIGMETRICS '91 Proceedings of the 1991 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Analysis of task migration in shared-memory multiprocessor scheduling
SIGMETRICS '91 Proceedings of the 1991 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Experimental comparison of memory management policies for NUMA multiprocessors
ACM Transactions on Computer Systems (TOCS)
SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
Introduction to parallel computing
Introduction to parallel computing
Factoring: a method for scheduling parallel loops
Communications of the ACM
A dynamic scheduling method for irregular parallel programs
PLDI '92 Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation
A dynamic processor allocation policy for multiprogrammed shared-memory multiprocessors
ACM Transactions on Computer Systems (TOCS)
Parallel Programming and Compilers
Parallel Programming and Compilers
Trapezoid Self-Scheduling: A Practical Scheduling Scheme for Parallel Compilers
IEEE Transactions on Parallel and Distributed Systems
Using Processor-Cache Affinity Information in Shared-Memory Multiprocessor Scheduling
IEEE Transactions on Parallel and Distributed Systems
Shared-Memory Multiprocessor Trends and the Implications for Parallel Program Performance
Shared-Memory Multiprocessor Trends and the Implications for Parallel Program Performance
Combining static and dynamic scheduling on distributed-memory multiprocessors
ICS '94 Proceedings of the 8th international conference on Supercomputing
Compiler optimizations for eliminating barrier synchronization
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Future applicability of bus-based shared memory multiprocessors
Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures
Balancing processor loads and exploiting data locality in N-body simulations
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Unified compilation techniques for shared and distributed address space machines
ICS '95 Proceedings of the 9th international conference on Supercomputing
IEEE/ACM Transactions on Networking (TON)
A template for non-uniform parallel loops based on dynamic scheduling and prefetching techniques
ICS '96 Proceedings of the 10th international conference on Supercomputing
Adaptively Scheduling Parallel Loops in Distributed Shared-Memory Systems
IEEE Transactions on Parallel and Distributed Systems
Scheduling policies to support distributed 3D multimedia applications
SIGMETRICS '98/PERFORMANCE '98 Proceedings of the 1998 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Coarse grained parallel computing on heterogeneous systems
SAC '98 Proceedings of the 1998 ACM symposium on Applied Computing
Cacheminer: A Runtime Approach to Exploit Cache Locality on SMP
IEEE Transactions on Parallel and Distributed Systems
Dynamic Task Scheduling Using Online Optimization
IEEE Transactions on Parallel and Distributed Systems
Exploiting Wavefront Parallelism on Large-Scale Shared-Memory Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
The trade-off between implicit and explicit data distribution in shared-memory programming paradigms
ICS '01 Proceedings of the 15th international conference on Supercomputing
Affinity scheduling of unbalanced workloads
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Runtime vs. Manual Data Distribution for Architecture-Agnostic Shared-Memory Programming Models
International Journal of Parallel Programming
Enhancing Software DSM for Compiler-Parallelized Applications
IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
A Compile-Time Partitioning Strategy for Non-Rectangular Loop Nests
IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Feedback Guided Scheduling of Nested Loops
PARA '00 Proceedings of the 5th International Workshop on Applied Parallel Computing, New Paradigms for HPC in Industry and Academia
IPDPS '00/JSSPP '00 Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing
LCR '98 Selected Papers from the 4th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
Scheduling at Twilight the Easy Way
STACS '02 Proceedings of the 19th Annual Symposium on Theoretical Aspects of Computer Science
Customized dynamic load balancing for a network of workstations
HPDC '96 Proceedings of the 5th IEEE International Symposium on High Performance Distributed Computing
LODS: locality-oriented dynamic scheduling for on-chip multiprocessors
Proceedings of the 41st annual Design Automation Conference
Shared memory multiprocessor support for functional array processing in SAC
Journal of Functional Programming
Design and implementation of a novel dynamic load balancing library for cluster computing
Parallel Computing - Heterogeneous computing
Feedback guided dynamic loop scheduling: convergence of the continuous case
The Journal of Supercomputing - Special issue: Parallel and distributed processing and applications
Proceedings of the 2006 workshop on Memory system performance and correctness
SAC: off-the-shelf support for data-parallelism on multicores
Proceedings of the 2007 workshop on Declarative aspects of multicore programming
Memory bank aware dynamic loop scheduling
Proceedings of the conference on Design, automation and test in Europe
Feedback-directed thread scheduling with memory considerations
Proceedings of the 16th international symposium on High performance distributed computing
Experience distributing objects in an SMMP OS
ACM Transactions on Computer Systems (TOCS)
Enhancing self-scheduling algorithms via synchronization and weighting
Journal of Parallel and Distributed Computing
Scalable loop self-scheduling schemes for heterogeneous clusters
International Journal of Computational Science and Engineering
Performance evaluation of a dynamic load-balancing library for cluster computing
International Journal of Computational Science and Engineering
Future Generation Computer Systems
The effectiveness of affinity-based scheduling in multiprocessor networking
INFOCOM'96 Proceedings of the Fifteenth annual joint conference of the IEEE computer and communications societies conference on The conference on computer communications - Volume 1
Dynamic multi phase scheduling for heterogeneous cluste
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
An efficient approach for self-scheduling parallel loops on multiprogrammed parallel computers
LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
A new carried-dependence self-scheduling algorithm
ICCSA'05 Proceedings of the 2005 international conference on Computational Science and its Applications - Volume Part I
Convergence of the discrete FGDLS algorithm
HPCC'05 Proceedings of the First international conference on High Performance Computing and Communications
Hi-index | 0.00 |
Loops are the single largest source of parallelism in many applications. One way to exploit this parallelism is to execute loop iterations in parallel on different processors. Previous approaches to loop scheduling attempted to achieve the minimum completion time by distributing the workload as evenly as possible while minimizing the number of synchronization operations required. The authors consider a third dimension to the problem of loop scheduling on shared-memory multiprocessors: communication overhead caused by accesses to nonlocal data. They show that traditional algorithms for loop scheduling, which ignore the location of data when assigning iterations to processors, incur a significant performance penalty on modern shared-memory multiprocessors. They propose a new loop scheduling algorithm that attempts to simultaneously balance the workload, minimize synchronization, and co-locate loop iterations with the necessary data. They compare the performance of this new algorithm to other known algorithms by using five representative kernel programs on a Silicon Graphics multiprocessor workstation, a BBN Butterfly, a Sequent Symmetry, and a KSR-1, and show that the new algorithm offers substantial performance improvements, up to a factor of 4 in some cases. The authors conclude that loop scheduling algorithms for shared-memory multiprocessors cannot afford to ignore the location of data, particularly in light of the increasing disparity between processor and memory speeds.