The influence of parallel decomposition strategies on the performance of multiprocessor systems
ISCA '85 Proceedings of the 12th annual international symposium on Computer architecture
Cost-Performance Bounds for Multimicrocomputer Networks
IEEE Transactions on Computers
Communication Structures for Large Networks of Microcomputers
IEEE Transactions on Computers
The Performance of Multistage Interconnection Networks for Multiprocessors
IEEE Transactions on Computers
A bus network designed to support parallel processing
ACM '87 Proceedings of the 1987 Fall Joint Computer Conference on Exploring technology: today and tomorrow
Interprocessor communication speed and performance in distributed-memory parallel processors
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
The higher radix hypercube as an interconnection and virtual network
CSC '89 Proceedings of the 17th conference on ACM Annual Computer Science Conference
Beyond loop partitioning: data assignment and overlap to reduce communication overhead
ICS '91 Proceedings of the 5th international conference on Supercomputing
Improved Algorithms for Mapping Pipelined and Parallel Computations
IEEE Transactions on Computers
Compiler techniques for data partitioning of sequentially iterated parallel loops
ICS '90 Proceedings of the 4th international conference on Supercomputing
Determining the idle time of a tiling
Proceedings of the 24th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Fault simulation in a distributed environment
DAC '88 Proceedings of the 25th ACM/IEEE Design Automation Conference
Schedule-independent storage mapping for loops
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Selecting tile shape for minimal execution time
Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures
Performance issues for distributed battlefield simulations
WSC '87 Proceedings of the 19th conference on Winter simulation
Data Relation Vectors: A New Abstraction for Data Optimizations
IEEE Transactions on Computers - Special issue on the parallel architecture and compilation techniques conference
Pipelined Data Parallel Algorithms-I: Concept and Modeling
IEEE Transactions on Parallel and Distributed Systems
Compile-Time Partitioning of Iterative Parallel Loops to Reduce Cache Coherency Traffic
IEEE Transactions on Parallel and Distributed Systems
Performance Analysis of the Communication Architecture of the Connection Machine
IEEE Transactions on Parallel and Distributed Systems
On the Parallel Execution Time of Tiled Loops
IEEE Transactions on Parallel and Distributed Systems
On k-ary n-cubes: theory and applications
Discrete Applied Mathematics - Special issue: Algorithmic aspects of communication
Performance Testing of a Parallel Multiblock CFD Solver
International Journal of High Performance Computing Applications
The effect of multiprocessor radius on scaling
Parallel Computing
Forward communication only placements and their use for parallel program construction
LCPC'02 Proceedings of the 15th international conference on Languages and Compilers for Parallel Computing
PARTANS: An autotuning framework for stencil computation on multi-GPU systems
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
MPI datatype processing using runtime compilation
Proceedings of the 20th European MPI Users' Group Meeting
Hi-index | 14.98 |
Given a discretization stencil, partitioning the problem domain is an important first step for the efficient solution of partial differential equations on multiple processor systems. We derive partitions that minimize interprocessor communication when the number of processors is known a priori and each domain partition is assigned to a different processor. Our partitioning technique uses the stencil structure to select appropriate partition shapes. For square problem domains, we show that nonstandard partitions (e.g., hexagons) are frequently preferable to the standard square partitions for a variety of commonly used stencils. We conclude with a formalization of the relationship between partition shape, stencil structure, and architecture, allowing selection of optimal partitions for a variety of parallel systems.