On the problem of optimizing data transfers for complex memory systems
ICS '88 Proceedings of the 2nd international conference on Supercomputing
Strategies for cache and local memory management by global program transformation
Proceedings of the 1st International Conference on Supercomputing
Program Improvement by Source-to-Source Transformation
Journal of the ACM (JACM)
The NYU Ultracomputer—designing a MIMD, shared-memory parallel machine (Extended Abstract)
ISCA '82 Proceedings of the 9th annual symposium on Computer Architecture
Compiler memory management and compound function definition for multiprocessors
Compiler memory management and compound function definition for multiprocessors
Software methods for improvement of cache performance on supercomputer applications
Software methods for improvement of cache performance on supercomputer applications
Work distribution in parallel programs for distributed memory multiprocessors
ICS '91 Proceedings of the 5th international conference on Supercomputing
Beyond loop partitioning: data assignment and overlap to reduce communication overhead
ICS '91 Proceedings of the 5th international conference on Supercomputing
Automatic transformation of FORTRAN loops to reduce cache conflicts
ICS '91 Proceedings of the 5th international conference on Supercomputing
An architecture for software-controlled data prefetching
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Comparative evaluation of latency reducing and tolerating techniques
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
An effective on-chip preloading scheme to reduce data access penalty
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Hiding memory latency using dynamic scheduling in shared-memory multiprocessors
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
A transformational approach to compiling Sisal for distributed memory architectures
ICS '92 Proceedings of the 6th international conference on Supercomputing
Reducing memory latency via non-blocking and prefetching caches
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Design and evaluation of a compiler algorithm for prefetching
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
An efficient architecture for loop based data preloading
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
The cedar system and an initial performance study
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Managing pages in shared virtual memory systems: getting the compiler into the game
ICS '93 Proceedings of the 7th international conference on Supercomputing
Data and program restructuring of irregular applications for cache-coherent multiprocessor
ICS '94 Proceedings of the 8th international conference on Supercomputing
Reducing cache conflicts in data cache prefetching
ACM SIGARCH Computer Architecture News - Special issue on input/output in parallel computer systems
A performance study of software and hardware data prefetching schemes
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Hardware implementation issues of data prefetching
ICS '95 Proceedings of the 9th international conference on Supercomputing
A limit study of local memory requirements using value reuse profiles
Proceedings of the 28th annual international symposium on Microarchitecture
An effective programmable prefetch engine for on-chip caches
Proceedings of the 28th annual international symposium on Microarchitecture
An integrated compile-time/run-time software distributed shared memory system
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Examination of a memory access classification scheme for pointer-intensive and numeric programs
ICS '96 Proceedings of the 10th international conference on Supercomputing
Tolerating latency in multiprocessors through compiler-inserted prefetching
ACM Transactions on Computer Systems (TOCS)
ICS '98 Proceedings of the 12th international conference on Supercomputing
Hardware-driven prefetching for pointer data references
ICS '98 Proceedings of the 12th international conference on Supercomputing
CPU Cache Prefetching: Timing Evaluation of Hardware Implementations
IEEE Transactions on Computers
The Cedar system and an initial performance study
25 years of the international symposia on Computer architecture (selected papers)
A General Interprocedural Framework for Placement of Split-Phase Large Latency Operations
IEEE Transactions on Parallel and Distributed Systems
An Integrated Hardware/Software Data Prefetching Scheme for Shared-Memory Multiprocessors
International Journal of Parallel Programming
Push vs. pull: data movement for linked data structures
Proceedings of the 14th international conference on Supercomputing
Reducing the impact of software prefetching on register pressure
SAC '00 Proceedings of the 2000 ACM symposium on Applied computing - Volume 2
The Journal of Supercomputing
ACM Computing Surveys (CSUR)
A novel renaming mechanism that boosts software prefetching
ICS '01 Proceedings of the 15th international conference on Supercomputing
Page replacement using marginal loss functions
SC '97 Proceedings of the 1997 ACM/IEEE conference on Supercomputing
Effective Hardware-Based Data Prefetching for High-Performance Processors
IEEE Transactions on Computers
Stride-directed Prefetching for Secondary Caches
ICPP '97 Proceedings of the international Conference on Parallel Processing
An adaptive sequential prefetching scheme in shared-memory multiprocessors
ICPP '97 Proceedings of the international Conference on Parallel Processing
Towards OpenMP Execution on Software Distributed Shared Memory Systems
ISHPC '02 Proceedings of the 4th International Symposium on High Performance Computing
Exploiting the Prefetching Effect Provided by Executing Mispredicted Load Instructions
Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
Program balance and its impact on high performance RISC architectures
HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Optimizing OpenMP programs on software distributed shared memory systems
International Journal of Parallel Programming - Special issue: OpenMP: Experiences and implementations
Compiler Optimization Technique for Data Cache Prefetching Using a Small CAM Array
ICPP '94 Proceedings of the 1994 International Conference on Parallel Processing - Volume 01
Characteristics of workloads used in high performance and technical computing
Proceedings of the 21st annual international conference on Supercomputing
Optimal multistream sequential prefetching in a shared cache
ACM Transactions on Storage (TOS)
Hi-index | 0.01 |
Memory hierarchies are used by multiprocessor systems to reduce large memory access times. It is necessary to automatically manage such a hierarchy, to obtain effective memory utilization. In this paper, we discuss the various issues involved in obtaining an optimal memory management strategy for a memory hierarchy. We present an algorithm for finding the earliest point in a program that a block of data can be prefetched. This determination is based on the control and data dependencies in the program. Such a method is an integral part of more general memory management algorithms. We demonstrate our method's potential by using static analysis to estimate the performance improvement afforded by our prefetching strategy and to analyze the reference patterns in a set of Fortran benchmarks. We also study the effectiveness of prefetching in a realistic shared-memory system using an RTL-level simulator and real codes. This differs from previous studies by considering prefetching benefits in the presence of network contention.