Data Locality Exploitation in the Decomposition of Regular Domain Problems

Authors:
Manuel Prieto;Ignacio M. Llorente;Francisco Tirado
Affiliations:
Univ. Complutense de Madrid, Madrid, Spain;Univ. Complutense de Madrid, Madrid, Spain;Univ. Complutense de Madrid, Madrid, Spain
Venue:
IEEE Transactions on Parallel and Distributed Systems
Year:
2000

Citing 27
Cited 12

GIVE-N-TAKE—a balanced code placement framework

PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Techniques to overlap computation and communication in irregular iterative applications

ICS '94 Proceedings of the 8th international conference on Supercomputing
Compiler optimizations for improving data locality

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Unifying data and control transformations for distributed shared-memory machines

PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Data and computation transformations for multiprocessors

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Software libraries for linear algebra computations on high performance computers

SIAM Review
Hitting the memory wall: implications of the obvious

ACM SIGARCH Computer Architecture News
Missing the memory wall: the case for processor/memory integration

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Synchronization and communication in the T3E multiprocessor

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Data distribution support on distributed shared memory multiprocessors

Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
The SGI Origin: a ccNUMA highly scalable server

Proceedings of the 24th annual international symposium on Computer architecture
Performance of the CRAY T3E multiprocessor

SC '97 Proceedings of the 1997 ACM/IEEE conference on Supercomputing
Designing and Building Parallel Programs: Concepts and Tools for Parallel Software Engineering

Designing and Building Parallel Programs: Concepts and Tools for Parallel Software Engineering
Parallel Computer Architecture: A Hardware/Software Approach

Parallel Computer Architecture: A Hardware/Software Approach
Interconnection Networks: An Engineering Approach

Interconnection Networks: An Engineering Approach
Modeling Communication Overhead: MPI and MPL Performance on the IBM SP2

IEEE Parallel & Distributed Technology: Systems & Technology
Assessing the Performance of the New IBM SP2 Communication Subsystem

IEEE Parallel & Distributed Technology: Systems & Technology
Making Network Interfaces Less Peripheral

Computer
Relationships Between Efficiency and Execution Time of Full Multigrid Methods on Parallel Computers

IEEE Transactions on Parallel and Distributed Systems
Message-Passing Performance of Parallel Computers

Euro-Par '97 Proceedings of the Third International Euro-Par Conference on Parallel Processing
Iterative Algorithms on High Performance Architectures

Euro-Par '97 Proceedings of the Third International Euro-Par Conference on Parallel Processing
Message Passing Evaluation and Analysis on Cray T3E and SGI Origin 2000 Systems

Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
Selected Results from the ParkBench Benchmark

Euro-Par '96 Proceedings of the Second International Euro-Par Conference on Parallel Processing-Volume II
Partitioning Regular Domains on Modern Parallel Computers

VECPAR '98 Selected Papers and Invited Talks from the Third International Conference on Vector and Parallel Processing
A Performance Analysis of the SGI Origin2000

VECPAR '98 Selected Papers and Invited Talks from the Third International Conference on Vector and Parallel Processing
Distributed parallel computers versus PVM on a workstation cluster in the simulation of time dependent partial differential equations

PDP '95 Proceedings of the 3rd Euromicro Workshop on Parallel and Distributed Processing
Solution of Alternating-Line Processes on Modern Parallel Computers

ICPP '99 Proceedings of the 1999 International Conference on Parallel Processing

P-3PC: A Point-to-Point Communication Model for Automatic and Optimal Decomposition of Regular Domain Problems

IEEE Transactions on Parallel and Distributed Systems
A software architecture for user transparent parallel image processing

Parallel Computing - Parallel computing in image and video processing
Parallel Wavelet Transform for Large Scale Image Processing

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
A parallel multigrid solver for viscous flows on anisotropic structured grids

Parallel Computing
Incorporating memory layout in the modeling of message passing programs

Journal of Systems Architecture: the EUROMICRO Journal - Special issue: Parallel, distributed and network-based processing
Finite State Machine-Based Optimization of Data Parallel Regular Domain Problems Applied in Low-Level Image Processing

IEEE Transactions on Parallel and Distributed Systems
Transformations to Parallel Codes for Communication-Computation Overlap

SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Impact of platform heterogeneity on the design of parallel algorithms for morphological processing of high-dimensional image data

The Journal of Supercomputing
Beowulf performance in CFD multigrid applications

EUROMICRO-PDP'02 Proceedings of the 10th Euromicro conference on Parallel, distributed and network-based processing
Incorporating memory layout in the modeling of message passing programs

EUROMICRO-PDP'02 Proceedings of the 10th Euromicro conference on Parallel, distributed and network-based processing
Parallel morphological processing of hyperspectral image data on heterogeneous networks of computers

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Message strip-mining heuristics for high speed networks

VECPAR'04 Proceedings of the 6th international conference on High Performance Computing for Computational Science

Quantified Score

Hi-index	0.00

Visualization

Abstract

The aim of this paper is to study the effect of local memory hierarchy and communication network exploitation on message sending and the influence of this effect on the decomposition of regular applications. In particular, we have considered two different parallel computers, a Cray T3E-900 and an SGI Origin 2000. In both systems, the bandwidth reduction due to non-unit-stride memory access is quite significant and could be more important than the reduction due to contention in the network. These conclusions affect the choice of optimal decompositions for regular domains problems. Thus, although traditional 3D decompositions lead to lower inherent communication-to-computation ratios and could exploit more efficiently the interconnection network, lower dimensional decompositions are found to be more efficient due to the data decomposition effects on the spatial locality of the messages to be communicated. This increasing importance of local optimisations has also been shown using a well-known communication-computation overlapping technique which increases execution time, instead of reducing it as we could expect, due to poor cache memory exploitation.