Partitioning and Mapping Algorithms into Fixed Size Systolic Arrays
IEEE Transactions on Computers
Program partitioning and synchronization on multiprocessor systems
Program partitioning and synchronization on multiprocessor systems
C++
Computers and Intractability: A Guide to the Theory of NP-Completeness
Computers and Intractability: A Guide to the Theory of NP-Completeness
Structure of Computers and Computations
Structure of Computers and Computations
Multiprocessors: discussion of some theoretical and practical problems
Multiprocessors: discussion of some theoretical and practical problems
Optimizing supercompilers for supercomputers
Optimizing supercompilers for supercomputers
Vectorization and parallelization of irregular problems via graph coloring
ICS '91 Proceedings of the 5th international conference on Supercomputing
Models of machines and computation for mapping in multicomputers
ACM Computing Surveys (CSUR)
Partitioning the global space for distributed memory systems
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Reducing data communication overhead for DOACROSS loop nests
ICS '94 Proceedings of the 8th international conference on Supercomputing
ICS '94 Proceedings of the 8th international conference on Supercomputing
Reducing false sharing on shared memory multiprocessors through compile time data transformations
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Compiler techniques for data partitioning of sequentially iterated parallel loops
ICS '90 Proceedings of the 4th international conference on Supercomputing
An Efficient Solution to the Cache Thrashing Problem Caused by True Data Sharing
IEEE Transactions on Computers
A Space-Time Representation Method of Iterative Algorithms for the Design of Processor Arrays
Journal of VLSI Signal Processing Systems
Chain Grouping: A Method for Partitioning Loops onto Mesh-Connected Processor Arrays
IEEE Transactions on Parallel and Distributed Systems
Mapping Nested Loop Algorithms into Multidimensional Systolic Arrays
IEEE Transactions on Parallel and Distributed Systems
Pipelined Data Parallel Algorithms-II: Design
IEEE Transactions on Parallel and Distributed Systems
Synthesizing Nested Loop Algorithms Using Nonlinear Transformation Method
IEEE Transactions on Parallel and Distributed Systems
Compile-Time Partitioning of Iterative Parallel Loops to Reduce Cache Coherency Traffic
IEEE Transactions on Parallel and Distributed Systems
Partitioning and Mapping Nested Loops on Multiprocessor Systems
IEEE Transactions on Parallel and Distributed Systems
Partitioning and Labeling of Loops by Unimodular Transformations
IEEE Transactions on Parallel and Distributed Systems
On Loop Transformations for Generalized Cycle Shrinking
IEEE Transactions on Parallel and Distributed Systems
Constructive Methods for Scheduling Uniform Loop Nests
IEEE Transactions on Parallel and Distributed Systems
Complexity of Multi-dimensional Loop Alignment
STACS '02 Proceedings of the 19th Annual Symposium on Theoretical Aspects of Computer Science
Efficient Execution of Doacross Loops on Distributed Memory Systems
PACT '93 Proceedings of the IFIP WG10.3. Working Conference on Architectures and Compilation Techniques for Fine and Medium Grain Parallelism
Evaluation of Loop Grouping Methods Based on Orthogonal Projection Spaces
ICPP '00 Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing
Mapping rectangular mesh algorithms onto asymptotically space-optimal arrays
Journal of Parallel and Distributed Computing
Reducing off-chip memory access via stream-conscious tiling on multimedia applications
International Journal of Parallel Programming
Journal of Parallel and Distributed Computing
Dynamic cache contention detection in multi-threaded applications
Proceedings of the 7th ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
Hi-index | 14.98 |
Parallel execution of nonvectorizable uniform recurrences is considered. When naively scheduled, such recurrences could create unacceptable communication and synchronization on a multiprocessor. The minimum-distance method partitions such recurrences into totally independent computations without increasing redundancy or perturbing numerical stability. The independent computations are well suited for execution on a multiprocessor, but they may not utilize all available processors. How extra processors can be applied to the independent computations is addressed. The methods are especially attractive for multiprocessors comprised of clusters.