Stencils and problem partitionings: their influence on the performance of multiple processor systems

Authors:
D. A. Reed;L. M. Adams;M. L. Partick
Affiliations:
-;-;-
Venue:
IEEE Transactions on Computers
Year:
1987

Citing 4
Cited 22

The influence of parallel decomposition strategies on the performance of multiprocessor systems

ISCA '85 Proceedings of the 12th annual international symposium on Computer architecture
Cost-Performance Bounds for Multimicrocomputer Networks

IEEE Transactions on Computers
Communication Structures for Large Networks of Microcomputers

IEEE Transactions on Computers
The Performance of Multistage Interconnection Networks for Multiprocessors

IEEE Transactions on Computers

A bus network designed to support parallel processing

ACM '87 Proceedings of the 1987 Fall Joint Computer Conference on Exploring technology: today and tomorrow
Interprocessor communication speed and performance in distributed-memory parallel processors

ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
The higher radix hypercube as an interconnection and virtual network

CSC '89 Proceedings of the 17th conference on ACM Annual Computer Science Conference
Beyond loop partitioning: data assignment and overlap to reduce communication overhead

ICS '91 Proceedings of the 5th international conference on Supercomputing
Improved Algorithms for Mapping Pipelined and Parallel Computations

IEEE Transactions on Computers
Compiler techniques for data partitioning of sequentially iterated parallel loops

ICS '90 Proceedings of the 4th international conference on Supercomputing
Determining the idle time of a tiling

Proceedings of the 24th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Fault simulation in a distributed environment

DAC '88 Proceedings of the 25th ACM/IEEE Design Automation Conference
Schedule-independent storage mapping for loops

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Selecting tile shape for minimal execution time

Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures
Performance issues for distributed battlefield simulations

WSC '87 Proceedings of the 19th conference on Winter simulation
Data Relation Vectors: A New Abstraction for Data Optimizations

IEEE Transactions on Computers - Special issue on the parallel architecture and compilation techniques conference
Pipelined Data Parallel Algorithms-I: Concept and Modeling

IEEE Transactions on Parallel and Distributed Systems
Compile-Time Partitioning of Iterative Parallel Loops to Reduce Cache Coherency Traffic

IEEE Transactions on Parallel and Distributed Systems
Performance Analysis of the Communication Architecture of the Connection Machine

IEEE Transactions on Parallel and Distributed Systems
On the Parallel Execution Time of Tiled Loops

IEEE Transactions on Parallel and Distributed Systems
On k-ary n-cubes: theory and applications

Discrete Applied Mathematics - Special issue: Algorithmic aspects of communication
Performance Testing of a Parallel Multiblock CFD Solver

International Journal of High Performance Computing Applications
The effect of multiprocessor radius on scaling

Parallel Computing
Forward communication only placements and their use for parallel program construction

LCPC'02 Proceedings of the 15th international conference on Languages and Compilers for Parallel Computing
PARTANS: An autotuning framework for stencil computation on multi-GPU systems

ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
MPI datatype processing using runtime compilation

Proceedings of the 20th European MPI Users' Group Meeting

Quantified Score

Hi-index	14.98

Visualization

Abstract

Given a discretization stencil, partitioning the problem domain is an important first step for the efficient solution of partial differential equations on multiple processor systems. We derive partitions that minimize interprocessor communication when the number of processors is known a priori and each domain partition is assigned to a different processor. Our partitioning technique uses the stencil structure to select appropriate partition shapes. For square problem domains, we show that nonstandard partitions (e.g., hexagons) are frequently preferable to the standard square partitions for a variety of commonly used stencils. We conclude with a formalization of the relationship between partition shape, stencil structure, and architecture, allowing selection of optimal partitions for a variety of parallel systems.