Allocating Independent Subtasks on Parallel Processors
IEEE Transactions on Software Engineering
A Scheme to Enforce Data Dependence on Large Multiprocessor Systems
IEEE Transactions on Software Engineering
Guided self-scheduling: A practical scheduling scheme for parallel supercomputers
IEEE Transactions on Computers
Automatic decomposition of scientific programs for parallel execution
POPL '87 Proceedings of the 14th ACM SIGACT-SIGPLAN symposium on Principles of programming languages
Measuring Parallelism in Computation-Intensive Scientific/Engineering Applications
IEEE Transactions on Computers
A framework for determining useful parallelism
ICS '88 Proceedings of the 2nd international conference on Supercomputing
Impact of self-scheduling order on performance on multiprocessor systems
ICS '88 Proceedings of the 2nd international conference on Supercomputing
Self-scheduling, data synchronization and program transformation for multiprocessor systems
Self-scheduling, data synchronization and program transformation for multiprocessor systems
Operating system data structures for shared memory mimd machines with fetch-and-add
Operating system data structures for shared memory mimd machines with fetch-and-add
Low-overhead scheduling of nested parallelism
IBM Journal of Research and Development
Compiler transformations for high-performance computing
ACM Computing Surveys (CSUR)
Compiler techniques for data synchronization in nested parallel loops
ICS '90 Proceedings of the 4th international conference on Supercomputing
Provably efficient scheduling for languages with fine-grained parallelism
Journal of the ACM (JACM)
An Iteration Partition Approach for Cache or Local Memory Thrashing on Parallel Processing
IEEE Transactions on Computers
Partitioning and Labeling of Loops by Unimodular Transformations
IEEE Transactions on Parallel and Distributed Systems
Dependence Uniformization: A Loop Parallelization Technique
IEEE Transactions on Parallel and Distributed Systems
Adaptive scheduling with parallelism feedback
Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Adaptive work stealing with parallelism feedback
Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Adaptive work-stealing with parallelism feedback
ACM Transactions on Computer Systems (TOCS)
Improved results for scheduling batched parallel jobs by using a generalized analysis framework
Journal of Parallel and Distributed Computing
FleXilicon architecture and its VLSI implementation
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Provably efficient two-level adaptive scheduling
JSSPP'06 Proceedings of the 12th international conference on Job scheduling strategies for parallel processing
Enhanced loop coalescing: a compiler technique for transforming non-uniform iteration spaces
ISHPC'05/ALPS'06 Proceedings of the 6th international symposium on high-performance computing and 1st international conference on Advanced low power systems
A new carried-dependence self-scheduling algorithm
ICCSA'05 Proceedings of the 2005 international conference on Computational Science and its Applications - Volume Part I
Hi-index | 14.98 |
A processor self-scheduling scheme is proposed for general parallel nested loops in multiprocessor systems. In this scheme, programs are instrumented to allow processors to schedule loop iterations among themselves dynamically at run time without involving the operating system. The scheme has two levels. At the low level, it uses simple fetch-and-op operations to take advantage of the regular structure in the innermost parallel loop nests; at the high level, the irregular structure of the outer loops (parallel or serial) and the IF-THEN-ELSE constructs are handled by using dynamic parallel linked lists. The larger granularity or the processes at the high level easily justifies the added overhead incurred from maintaining such dynamic data structures. The use of guided self-scheduling (GSS) and shortest-delay self-scheduling (SDSS) in this scheme is analyzed.