Designing efficient algorithms for parallel computers
Designing efficient algorithms for parallel computers
Processor Allocation for Horizontal and Vertical Parallelism and Related Speedup Bounds
IEEE Transactions on Computers
Guided self-scheduling: A practical scheduling scheme for parallel supercomputers
IEEE Transactions on Computers
Parallel processor balance through loop spreading
Proceedings of the 1989 ACM/IEEE conference on Supercomputing
Utilizing Multidimensional Loop Parallelism on Large Scale Parallel Processor Systems
IEEE Transactions on Computers
Parallel processing: a smart compiler and a dumb machine
SIGPLAN '84 Proceedings of the 1984 SIGPLAN symposium on Compiler construction
Programs for Digital Signal Processing
Programs for Digital Signal Processing
Computer Architecture and Parallel Processing
Computer Architecture and Parallel Processing
Parallel Computers Two: Architecture, Programming and Algorithms
Parallel Computers Two: Architecture, Programming and Algorithms
Multiprocessors: discussion of some theoretical and practical problems
Multiprocessors: discussion of some theoretical and practical problems
Compile-time scheduling and optimization for asynchronous machines (multiprocessor, compiler, parallel processing)
Compiler optimizations and architecture design issues for multiprocessors (parallel)
Compiler optimizations and architecture design issues for multiprocessors (parallel)
ICS '94 Proceedings of the 8th international conference on Supercomputing
IEEE Transactions on Parallel and Distributed Systems
Hierarchical Compilation of Macro Dataflow Graphs for Multiprocessors with Local Memory
IEEE Transactions on Parallel and Distributed Systems
Parallel image processing with the block data parallel architecture
IBM Journal of Research and Development
Hi-index | 0.00 |
An important issue for the efficient use of multiprocessor systems is the assignment of parallel processors to nested parallel loops. It is desirable for a processor assignment algorithm to be fast and always generate an optimal processor assignment. The paper proposes two efficient algorithms to decide the optimal number of processors assigned to each individual loop. Efficient parallel counterparts of these two algorithms are also presented. These algorithms not only always generate an optimal processor assignment, but also are much faster than the exiting optimal algorithm in the literature. The paper discusses improving the performance of parallel execution by transforming a nested parallel loop into a semantically equivalent one. Three loop transformations are investigated. It is observed that, in most cases, the parallel execution time is improved after applying these transformations.