Processor Allocation for Horizontal and Vertical Parallelism and Related Speedup Bounds
IEEE Transactions on Computers
Guided self-scheduling: A practical scheduling scheme for parallel supercomputers
IEEE Transactions on Computers
Programs for Digital Signal Processing
Programs for Digital Signal Processing
Dependence graphs and compiler optimizations
POPL '81 Proceedings of the 8th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Structure of Computers and Computations
Structure of Computers and Computations
Multiprocessors: discussion of some theoretical and practical problems
Multiprocessors: discussion of some theoretical and practical problems
Compile-time scheduling and optimization for asynchronous machines (multiprocessor, compiler, parallel processing)
Compiler optimizations and architecture design issues for multiprocessors (parallel)
Compiler optimizations and architecture design issues for multiprocessors (parallel)
On program restructuring, scheduling, and communication for parallel processor systems
On program restructuring, scheduling, and communication for parallel processor systems
On the combination of hardware and software concurrency extraction methods
ACM SIGMICRO Newsletter
Switch-stacks: a scheme for microtasking nested parallel loops
Proceedings of the 1990 ACM/IEEE conference on Supercomputing
Processor allocation and loop scheduling on multiprocessor computers
ICS '92 Proceedings of the 6th international conference on Supercomputing
Combining static and dynamic scheduling on distributed-memory multiprocessors
ICS '94 Proceedings of the 8th international conference on Supercomputing
IEEE Transactions on Parallel and Distributed Systems
On the combination of hardware and software concurrency extraction methods
MICRO 20 Proceedings of the 20th annual workshop on Microprogramming
Pipelined Data Parallel Algorithms-II: Design
IEEE Transactions on Parallel and Distributed Systems
Synthesizing Nested Loop Algorithms Using Nonlinear Transformation Method
IEEE Transactions on Parallel and Distributed Systems
IEEE Transactions on Parallel and Distributed Systems
Synchronization and Communication Costs of Loop Partitioning on Shared-Memory Multiprocessor Systems
IEEE Transactions on Parallel and Distributed Systems
Optimal Processor Assignment for a Class of Pipelined Computations
IEEE Transactions on Parallel and Distributed Systems
Hierarchical Compilation of Macro Dataflow Graphs for Multiprocessors with Local Memory
IEEE Transactions on Parallel and Distributed Systems
New Software Technologies for the Development and Runtime Support of Complex Applications
International Journal of High Performance Computing Applications
FleXilicon architecture and its VLSI implementation
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Enhanced loop coalescing: a compiler technique for transforming non-uniform iteration spaces
ISHPC'05/ALPS'06 Proceedings of the 6th international symposium on high-performance computing and 1st international conference on Advanced low power systems
Hi-index | 14.98 |
Program parallelism and processor allocation issues for parallel processor systems are discussed. Optimal processor assignment algorithms are presented for simple and complex nested parallel loops. These processor assignment schemes can be used by the compiler to perform static processor allocation to multiply nested parallel loops. Speedup measurements for EISPACK and IEEE DSP subroutines that result from the optimal assignment of processors to parallel loops are also presented. These measurements indicate that optimal processor assignments result in almost linear speedups on parallel processor machines with a few tens of processes and significantly high speedups for machines with hundreds or thousands of processors.