Data and program restructuring of irregular applications for cache-coherent multiprocessor

Authors:
Karen A. Tomko;Santosh G. Abraham
Affiliations:
Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, MI;Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, MI
Venue:
ICS '94 Proceedings of the 8th international conference on Supercomputing
Year:
1994

Citing 10
Cited 3

A Partitioning Strategy for Nonuniform Problems on Multiprocessors

IEEE Transactions on Computers
Strategies for cache and local memory management by global program transformation

Journal of Parallel and Distributed Computing - Special Issue on Languages, Compilers and environments for Parallel Programming
Principles of runtime support for parallel processors

ICS '88 Proceedings of the 2nd international conference on Supercomputing
Software prefetching

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
A data locality optimizing algorithm

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Characterizing the behavior of sparse algorithms on caches

Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Evaluating the communication performance of MPPs using synthetic sparse matrix multiplication workloads

ICS '93 Proceedings of the 7th international conference on Supercomputing
Compiler-directed data prefetching in multiprocessors with memory hierarchies

ICS '90 Proceedings of the 4th international conference on Supercomputing
Compiling Parallel Loops for High Performance Computers: Partitioning, Data Assignment, and Remapping

Compiling Parallel Loops for High Performance Computers: Partitioning, Data Assignment, and Remapping
Parallelizing Loops with Indirect Array References of Pointers

Proceedings of the Fourth International Workshop on Languages and Compilers for Parallel Computing

Efficient support for irregular applications on distributed-memory machines

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Improving cache performance in dynamic applications through data and computation reorganization at run time

Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Run-Time Reference Clustering for Cache Performance Optimization

PAS '97 Proceedings of the 2nd AIZU International Symposium on Parallel Algorithms / Architecture Synthesis

Quantified Score

Hi-index	0.00

Visualization

Abstract

Applications with irregular data structures such as sparse matrices or finite element meshes account for a large fraction of engineering and scientific applications. Domain decomposition techniques are commonly used to partition these applications to reduce interprocessor communication on message passing parallel systems. Our work investigates the use of domain decomposition techniques on cache-coherent parallel systems.Many good domain decomposition algorithms are now available. We show that further application improvements are attainable using data and program restructuring in conjunction with domain decomposition. We give techniques for data layout to reduce communication, blocking with subdomains to improve uniprocessor cache behavior, and insertion of prefetches to hide the latency of interprocessor communication.This paper details our restructuring techniques and provides experimental results on the KSR1 multiprocessor for a sparse matrix application. The experimental results include counts of cache misses provided by the KSR PMON performance monitoring tool. Our data show that cache coherency traffic can be reduced by 30%–60% using our data layout scheme and that more than 53% of the remaining coherency cache misses can be eliminated using prefetch instructions.