Compiler-directed data prefetching in multiprocessors with memory hierarchies

Authors:
Edward H. Gornish;Elana D. Granston;Alexander V. Veidenbaum
Affiliations:
Center for Supercomputing Research and Development, University of Illinois at Urbana-Champaign, Urbana, Illinois;Center for Supercomputing Research and Development, University of Illinois at Urbana-Champaign, Urbana, Illinois;Center for Supercomputing Research and Development, University of Illinois at Urbana-Champaign, Urbana, Illinois
Venue:
ICS '90 Proceedings of the 4th international conference on Supercomputing
Year:
1990

Citing 6
Cited 45

On the problem of optimizing data transfers for complex memory systems

ICS '88 Proceedings of the 2nd international conference on Supercomputing
Strategies for cache and local memory management by global program transformation

Proceedings of the 1st International Conference on Supercomputing
Program Improvement by Source-to-Source Transformation

Journal of the ACM (JACM)
The NYU Ultracomputer—designing a MIMD, shared-memory parallel machine (Extended Abstract)

ISCA '82 Proceedings of the 9th annual symposium on Computer Architecture
Compiler memory management and compound function definition for multiprocessors

Compiler memory management and compound function definition for multiprocessors
Software methods for improvement of cache performance on supercomputer applications

Software methods for improvement of cache performance on supercomputer applications

Work distribution in parallel programs for distributed memory multiprocessors

ICS '91 Proceedings of the 5th international conference on Supercomputing
Beyond loop partitioning: data assignment and overlap to reduce communication overhead

ICS '91 Proceedings of the 5th international conference on Supercomputing
Automatic transformation of FORTRAN loops to reduce cache conflicts

ICS '91 Proceedings of the 5th international conference on Supercomputing
An architecture for software-controlled data prefetching

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Comparative evaluation of latency reducing and tolerating techniques

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
An effective on-chip preloading scheme to reduce data access penalty

Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Hiding memory latency using dynamic scheduling in shared-memory multiprocessors

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
A transformational approach to compiling Sisal for distributed memory architectures

ICS '92 Proceedings of the 6th international conference on Supercomputing
Reducing memory latency via non-blocking and prefetching caches

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Design and evaluation of a compiler algorithm for prefetching

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
An efficient architecture for loop based data preloading

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
The cedar system and an initial performance study

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Managing pages in shared virtual memory systems: getting the compiler into the game

ICS '93 Proceedings of the 7th international conference on Supercomputing
Data and program restructuring of irregular applications for cache-coherent multiprocessor

ICS '94 Proceedings of the 8th international conference on Supercomputing
Reducing cache conflicts in data cache prefetching

ACM SIGARCH Computer Architecture News - Special issue on input/output in parallel computer systems
A performance study of software and hardware data prefetching schemes

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Speeding up irregular applications in shared-memory multiprocessors: memory binding and group prefetching

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Hardware implementation issues of data prefetching

ICS '95 Proceedings of the 9th international conference on Supercomputing
A limit study of local memory requirements using value reuse profiles

Proceedings of the 28th annual international symposium on Microarchitecture
An effective programmable prefetch engine for on-chip caches

Proceedings of the 28th annual international symposium on Microarchitecture
An integrated compile-time/run-time software distributed shared memory system

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Examination of a memory access classification scheme for pointer-intensive and numeric programs

ICS '96 Proceedings of the 10th international conference on Supercomputing
Tolerating latency in multiprocessors through compiler-inserted prefetching

ACM Transactions on Computer Systems (TOCS)
Prefetching on the Cray-T3E

ICS '98 Proceedings of the 12th international conference on Supercomputing
Hardware-driven prefetching for pointer data references

ICS '98 Proceedings of the 12th international conference on Supercomputing
CPU Cache Prefetching: Timing Evaluation of Hardware Implementations

IEEE Transactions on Computers
The Cedar system and an initial performance study

25 years of the international symposia on Computer architecture (selected papers)
A General Interprocedural Framework for Placement of Split-Phase Large Latency Operations

IEEE Transactions on Parallel and Distributed Systems
An Integrated Hardware/Software Data Prefetching Scheme for Shared-Memory Multiprocessors

International Journal of Parallel Programming
Push vs. pull: data movement for linked data structures

Proceedings of the 14th international conference on Supercomputing
Reducing the impact of software prefetching on register pressure

SAC '00 Proceedings of the 2000 ACM symposium on Applied computing - Volume 2
On Interaction between Interconnection Network Design and Latency Hiding Techniques in Multiprocessors

The Journal of Supercomputing
Data prefetch mechanisms

ACM Computing Surveys (CSUR)
A novel renaming mechanism that boosts software prefetching

ICS '01 Proceedings of the 15th international conference on Supercomputing
Page replacement using marginal loss functions

SC '97 Proceedings of the 1997 ACM/IEEE conference on Supercomputing
Effective Hardware-Based Data Prefetching for High-Performance Processors

IEEE Transactions on Computers
Stride-directed Prefetching for Secondary Caches

ICPP '97 Proceedings of the international Conference on Parallel Processing
An adaptive sequential prefetching scheme in shared-memory multiprocessors

ICPP '97 Proceedings of the international Conference on Parallel Processing
Towards OpenMP Execution on Software Distributed Shared Memory Systems

ISHPC '02 Proceedings of the 4th International Symposium on High Performance Computing
Exploiting the Prefetching Effect Provided by Executing Mispredicted Load Instructions

Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
Program balance and its impact on high performance RISC architectures

HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Optimizing OpenMP programs on software distributed shared memory systems

International Journal of Parallel Programming - Special issue: OpenMP: Experiences and implementations
Compiler Optimization Technique for Data Cache Prefetching Using a Small CAM Array

ICPP '94 Proceedings of the 1994 International Conference on Parallel Processing - Volume 01
Characteristics of workloads used in high performance and technical computing

Proceedings of the 21st annual international conference on Supercomputing
Optimal multistream sequential prefetching in a shared cache

ACM Transactions on Storage (TOS)

Quantified Score

Hi-index	0.01

Visualization

Abstract

Memory hierarchies are used by multiprocessor systems to reduce large memory access times. It is necessary to automatically manage such a hierarchy, to obtain effective memory utilization. In this paper, we discuss the various issues involved in obtaining an optimal memory management strategy for a memory hierarchy. We present an algorithm for finding the earliest point in a program that a block of data can be prefetched. This determination is based on the control and data dependencies in the program. Such a method is an integral part of more general memory management algorithms. We demonstrate our method's potential by using static analysis to estimate the performance improvement afforded by our prefetching strategy and to analyze the reference patterns in a set of Fortran benchmarks. We also study the effectiveness of prefetching in a realistic shared-memory system using an RTL-level simulator and real codes. This differs from previous studies by considering prefetching benefits in the presence of network contention.