Strategies for cache and local memory management by global program transformation
Journal of Parallel and Distributed Computing - Special Issue on Languages, Compilers and environments for Parallel Programming
Analysis of memory referencing behavior for design of local memories
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Unified management of registers and cache using liveness and cache bypass
PLDI '89 Proceedings of the ACM SIGPLAN 1989 Conference on Programming language design and implementation
Computer architecture: a quantitative approach
Computer architecture: a quantitative approach
Machine organization of the IBM RISC System/6000 processor
IBM Journal of Research and Development
Improving register allocation for subscripted variables
PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
An architecture for software-controlled data prefetching
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Data access microarchitectures for superscalar processors with compiler-assisted data prefetching
MICRO 24 Proceedings of the 24th annual international symposium on Microarchitecture
Dynamic dependency analysis of ordinary programs
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Software support for speculative loads
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Reducing memory latency via non-blocking and prefetching caches
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Design and evaluation of a compiler algorithm for prefetching
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
ATOM: a system for building customized program analysis tools
PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Scalar replacement in the presence of conditional control flow
Software—Practice & Experience
A limit study of local memory requirements using value reuse profiles
Proceedings of the 28th annual international symposium on Microarchitecture
Memory bandwidth limitations of future microprocessors
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
On the use of SPEC benchmarks in computer architecture research
ACM SIGARCH Computer Architecture News
Streamlining inter-operation memory communication via data dependence prediction
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
An Algorithm for Optimally Exploiting Spatial and Temporal Locality in Upper Memory Levels
IEEE Transactions on Computers - Special issue on cache memory and related problems
IEEE Transactions on Computers
Workload Design: Selecting Representative Program-Input Pairs
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Memory System Support for Irregular Applications
LCR '98 Selected Papers from the 4th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
The Potential of Computation Regrouping for Improving Locality
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Locality approximation using time
Proceedings of the 34th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Impulse: Memory system support for scientific applications
Scientific Programming
Sampling-based program locality approximation
Proceedings of the 7th international symposium on Memory management
Scalable Implementation of Efficient Locality Approximation
Languages and Compilers for Parallel Computing
Hi-index | 0.00 |
While there has been an abundance of recent papers on hardware and software approaches to improving the performance of memory accesses, few papers have addressed the problem from the program's point of view. There is a general notion that certain programs have larger working sets than others. However, there is no quantitative method for evaluating and comparing the memory requirements of programs.This paper introduces the bandwidth spectrum for characterizing the memory requirements of a program's instruction and data stream. The bandwidth spectrum measures the average bandwidth requirement of a program as a function of available local memory. These measurements are performed under the most idealized conditions of perfect knowledge and perfect memory management. As such, they represent the lower bounds on the memory requirements of programs. We present the bandwidth spectrums for a set of 22 benchmarks and show how they can be used in the comparison of memory requirements and I/O requirement. The bandwidth spectrums also offer a convenient method to weigh the trade-off amongst instruction issue rate, local memory capacity and bandwidth into local memory.Using the bandwidth spectrum, we show that at issue rates of four or less, bandwidth usually scales linearly with the issue rate. At higher issue rates, bandwidth can often scale superlinearly with respect to issue rate. Finally, we also investigate the effects of varying the input sets on the bandwidth spectrums.