Guided self-scheduling: A practical scheduling scheme for parallel supercomputers
IEEE Transactions on Computers
Design Tradeoffs for Process Scheduling in Shared Memory Multiprocessor Systems
IEEE Transactions on Software Engineering
Combining static and dynamic scheduling on distributed-memory multiprocessors
ICS '94 Proceedings of the 8th international conference on Supercomputing
Impact of Memory Contention on Dynamic Scheduling on NUMA Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
Adaptively Scheduling Parallel Loops in Distributed Shared-Memory Systems
IEEE Transactions on Parallel and Distributed Systems
Space-efficient implementation of nested parallelism
PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Compile-time minimisation of load imbalance in loop nests
ICS '97 Proceedings of the 11th international conference on Supercomputing
Scheduling policies to support distributed 3D multimedia applications
SIGMETRICS '98/PERFORMANCE '98 Proceedings of the 1998 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Parallel Computing on an Ethernet Cluster of Workstations: Opportunities and Constraints
The Journal of Supercomputing
Space-efficient scheduling of nested parallelism
ACM Transactions on Programming Languages and Systems (TOPLAS)
SAC '94 Proceedings of the 1994 ACM symposium on Applied computing
Dynamic Task Scheduling Using Online Optimization
IEEE Transactions on Parallel and Distributed Systems
Dependence Uniformization: A Loop Parallelization Technique
IEEE Transactions on Parallel and Distributed Systems
Using Processor Affinity in Loop Scheduling on Shared-Memory Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
Load Balancing Highly Irregular Computations with the Adaptive Factoring
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Performance of Scheduling Scientific Applications with Adaptive Weighted Factoring
IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
A Theoretical Application of Feedback Guided Dynamic Loop Scheduling
IWCC '01 Proceedings of the NATO Advanced Research Workshop on Advanced Environments, Tools, and Applications for Cluster Computing-Revised Papers
Feedback Guided Scheduling of Nested Loops
PARA '00 Proceedings of the 5th International Workshop on Applied Parallel Computing, New Paradigms for HPC in Industry and Academia
A Semi-dynamic Multiprocessor Scheduling Algorithm with an Asymptotically Optimal Competitive Ratio
Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
Scheduling at Twilight the Easy Way
STACS '02 Proceedings of the 19th Annual Symposium on Theoretical Aspects of Computer Science
Adaptive Computing on the Grid Using AppLeS
IEEE Transactions on Parallel and Distributed Systems
Runtime Empirical Selection of Loop Schedulers on Hyperthreaded SMPs
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Shared memory multiprocessor support for functional array processing in SAC
Journal of Functional Programming
An Enhanced Parallel Loop Self-Scheduling Scheme for Cluster Environments
The Journal of Supercomputing
Design and implementation of a novel dynamic load balancing library for cluster computing
Parallel Computing - Heterogeneous computing
Feedback guided dynamic loop scheduling: convergence of the continuous case
The Journal of Supercomputing - Special issue: Parallel and distributed processing and applications
PackageBLAST: an adaptive multi-policy grid service for biological sequence comparison
Proceedings of the 2006 ACM symposium on Applied computing
IEEE Transactions on Computers
Memory bank aware dynamic loop scheduling
Proceedings of the conference on Design, automation and test in Europe
A performance-based parallel loop scheduling on grid environments
The Journal of Supercomputing
Enhancing self-scheduling algorithms via synchronization and weighting
Journal of Parallel and Distributed Computing
Dynamic partitioning of loop iterations on heterogeneous PC clusters
The Journal of Supercomputing
Dynamic load balancing with adaptive factoring methods in scientific applications
The Journal of Supercomputing
Scalable loop self-scheduling schemes for heterogeneous clusters
International Journal of Computational Science and Engineering
Performance evaluation of a dynamic load-balancing library for cluster computing
International Journal of Computational Science and Engineering
A practical application of FGDLS to birds flock trajectory
ICCOMP'05 Proceedings of the 9th WSEAS International Conference on Computers
Future Generation Computer Systems
Implementation of a Performance-Based Loop Scheduling on Heterogeneous Clusters
ICA3PP '09 Proceedings of the 9th International Conference on Algorithms and Architectures for Parallel Processing
A directive-based MPI code generator for Linux PC clusters
The Journal of Supercomputing
An adaptive multi-policy grid service for biological sequence comparison
Journal of Parallel and Distributed Computing
A parallel loop self-scheduling on extremely heterogeneous PC clusters
ICCS'03 Proceedings of the 2003 international conference on Computational science
Performance-based workload distribution on grid environments
GPC'07 Proceedings of the 2nd international conference on Advances in grid and pervasive computing
Is the schedule clause really necessary in OpenMP?
WOMPAT'03 Proceedings of the OpenMP applications and tools 2003 international conference on OpenMP shared memory parallel programming
Particle swarm optimisation based Diophantine equation solver
International Journal of Bio-Inspired Computation
A mltiple task allocation frame work for biological seqence comparision in a grid environment
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Dynamic multi phase scheduling for heterogeneous cluste
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
A new carried-dependence self-scheduling algorithm
ICCSA'05 Proceedings of the 2005 international conference on Computational Science and its Applications - Volume Part I
A performance-based parallel loop self-scheduling on grid computing environments
NPC'05 Proceedings of the 2005 IFIP international conference on Network and Parallel Computing
Convergence of the discrete FGDLS algorithm
HPCC'05 Proceedings of the First international conference on High Performance Computing and Communications
A hybrid parallel loop scheduling scheme on grid environments
GCC'05 Proceedings of the 4th international conference on Grid and Cooperative Computing
GPC'06 Proceedings of the First international conference on Advances in Grid and Pervasive Computing
Automatic OpenMP loop scheduling: a combined compiler and runtime approach
IWOMP'12 Proceedings of the 8th international conference on OpenMP in a Heterogeneous World
Accelerating MapReduce on a coupled CPU-GPU architecture
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Using analytical models to load balancing in a heterogeneous network of computers
PaCT'07 Proceedings of the 9th international conference on Parallel Computing Technologies
Distributing fixed time slices in heterogeneous networks of workstations (NOWs)
ISPA'07 Proceedings of the 5th international conference on Parallel and Distributed Processing and Applications
Concurrency and Computation: Practice & Experience
The Journal of Supercomputing
Load balancing in a changing world: dealing with heterogeneity and performance variability
Proceedings of the ACM International Conference on Computing Frontiers
Hi-index | 0.01 |
A practical processor self-scheduling scheme, trapezoid self-scheduling, is proposed for arbitrary parallel nested loops in shared-memory multiprocessors. Generally, loops are the richest source of parallelism in parallel programs. To dynamically allocate loop iterations to processors, one may achieve load balancing among processors at the expense of run-time scheduling overhead. By linearly decreasing the chunk size at run time, the best tradeoff between the scheduling overhead and balanced workload can be obtained in the proposed trapezoid self-scheduling approach. Due to its simplicity and flexibility, this approach can be efficiently implemented in any parallel compiler. The small and predictable number of chores also allow efficient management of memory in a static fashion. The experiments conducted in a 96-node Butterfly GP-1000 clearly show the advantage of the trapezoid self-scheduling over other well-known self-scheduling approaches.