On the problem of optimizing data transfers for complex memory systems

Authors:
K. Gallivan;W. Jalby;D. Gannon
Affiliations:
Univ. of Illinois at Urbana-Champaign, Urbana;INRIA, France;Indiana Univ., Bloomington
Venue:
ICS '88 Proceedings of the 2nd international conference on Supercomputing
Year:
1988

Citing 5
Cited 33

A cache coherence scheme with fast selective invalidation

ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Semi-automatic process partitioning for parallel computation

International Journal of Parallel Programming
Strategies for cache and local memory management by global program transformation

Proceedings of the 1st International Conference on Supercomputing
Organizing matrices and matrix operations for paged memory systems

Communications of the ACM
On program restructuring, scheduling, and communication for parallel processor systems

On program restructuring, scheduling, and communication for parallel processor systems

Run-time parallelization and scheduling of loops

SPAA '89 Proceedings of the first annual ACM symposium on Parallel algorithms and architectures
Building analytical models into an interactive performance prediction tool

Proceedings of the 1989 ACM/IEEE conference on Supercomputing
A methodology for parallelizing programs for multicomputers and complex memory multiprocessors

Proceedings of the 1989 ACM/IEEE conference on Supercomputing
Experimentally Characterizing the Behavior of Multiprocessor Memory Systems: A Case Study

IEEE Transactions on Software Engineering
Run-Time Parallelization and Scheduling of Loops

IEEE Transactions on Computers
Beyond loop partitioning: data assignment and overlap to reduce communication overhead

ICS '91 Proceedings of the 5th international conference on Supercomputing
Scanning polyhedra with DO loops

PPOPP '91 Proceedings of the third ACM SIGPLAN symposium on Principles and practice of parallel programming
Parallelization and performance of Conjugate Gradient algorithms on the Cedar hierarchical-memory multiprocessor

PPOPP '91 Proceedings of the third ACM SIGPLAN symposium on Principles and practice of parallel programming
Fortran at ten gigaflops: the connection machine convolution compiler

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Tiling multidimensional iteration spaces for nonshared memory machines

Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Seismic modeling at 14 gigaflops on the connection machine

Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Detecting redundant accesses to array data

Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Automatic partitioning of a program dependence graph into parallel tasks

IBM Journal of Research and Development
To copy or not to copy: a compile-time technique for assessing when data copying should be used to eliminate cache conflicts

Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Advanced compilation techniques in the PARADIGM compiler for distributed-memory multicomputers

ICS '95 Proceedings of the 9th international conference on Supercomputing
Compiler techniques for data partitioning of sequentially iterated parallel loops

ICS '90 Proceedings of the 4th international conference on Supercomputing
Compiler-directed data prefetching in multiprocessors with memory hierarchies

ICS '90 Proceedings of the 4th international conference on Supercomputing
Automatic selection of high-order transformations in the IBM XL FORTRAN compilers

IBM Journal of Research and Development - Special issue: performance analysis and its impact on design
An Efficient Solution to the Cache Thrashing Problem Caused by True Data Sharing

IEEE Transactions on Computers
The doconsider loop

ICS '89 Proceedings of the 3rd international conference on Supercomputing
Performance prediction of loop constructs on multiprocessor hierarchical-memory systems

ICS '89 Proceedings of the 3rd international conference on Supercomputing
Cache miss equations: a compiler framework for analyzing and tuning memory behavior

ACM Transactions on Programming Languages and Systems (TOPLAS)
Reducing memory requirements of nested loops for embedded systems

Proceedings of the 38th annual Design Automation Conference
PICO-NPA: High-Level Synthesis of Nonprogrammable Hardware Accelerators

Journal of VLSI Signal Processing Systems
Pipelined Data Parallel Algorithms-II: Design

IEEE Transactions on Parallel and Distributed Systems
Compile-Time Partitioning of Iterative Parallel Loops to Reduce Cache Coherency Traffic

IEEE Transactions on Parallel and Distributed Systems
Compile-Time Techniques for Data Distribution in Distributed Memory Machines

IEEE Transactions on Parallel and Distributed Systems
Communication-Free Data Allocation Techniques for Parallelizing Compilers on Multicomputers

IEEE Transactions on Parallel and Distributed Systems
Efficient Dependence Analysis for Java Arrays

Euro-Par '01 Proceedings of the 7th International Euro-Par Conference Manchester on Parallel Processing
Reference Distance as a Metric for Data Locality

HPC-ASIA '97 Proceedings of the High-Performance Computing on the Information Superhighway, HPC-Asia '97
Miss Rate Prediction Across Program Inputs and Cache Configurations

IEEE Transactions on Computers
Mapping the LU decomposition on a many-core architecture: challenges and solutions

Proceedings of the 6th ACM conference on Computing frontiers
Phase-Based miss rate prediction across program inputs

LCPC'04 Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing

Quantified Score

Hi-index	0.01

Visualization

Abstract

Parallel supercomputers architectures with complex memory hierarchies or distributed memory systems have become very common. Unfortunately, the problems associated with restructuring software to take advantage of these memory systems are not easily solved. This paper presents an overview of some of the mathematical issues behind several of these problems and attempts to give a brief look at some of the potential solutions.