Compilers: principles, techniques, and tools
Compilers: principles, techniques, and tools
Introduction to algorithms
Shade: a fast instruction-set simulator for execution profiling
SIGMETRICS '94 Proceedings of the 1994 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Internal organization of the Alpha 21164, a 300-MHz 64-bit quad-issue CMOS RISC microprocessor
Digital Technical Journal - Special 10th anniversary issue
The Omega Library interface guide
The Omega Library interface guide
Cache design trade-offs for power and performance optimization: a case study
ISLPED '95 Proceedings of the 1995 international symposium on Low power design
The filter cache: an energy efficient memory structure
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Memory data organization for improved cache performance in embedded processor applications
ACM Transactions on Design Automation of Electronic Systems (TODAES)
ISLPED '98 Proceedings of the 1998 international symposium on Low power electronics and design
Improving locality using loop and data transformations in an integrated framework
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
A Linear Algebra Framework for Automatic Determination of Optimal Data Layouts
IEEE Transactions on Parallel and Distributed Systems
Improving memory hierarchy performance for irregular applications
ICS '99 Proceedings of the 13th international conference on Supercomputing
ISLPED '99 Proceedings of the 1999 international symposium on Low power electronics and design
Memory system energy (poster session): influence of hardware-software optimizations
ISLPED '00 Proceedings of the 2000 international symposium on Low power electronics and design
Cache conscious data layout organization for embedded multimedia applications
Proceedings of the conference on Design, automation and test in Europe
Computers as components: principles of embedded computing system design
Computers as components: principles of embedded computing system design
Custom Memory Management Methodology: Exploration of Memory Organisation for Embedded Multimedia System Design
High Performance Compilers for Parallel Computing
High Performance Compilers for Parallel Computing
Multiprocessors from a Software Perspective
IEEE Micro
Advanced Data Layout Optimization for Multimedia Applications
IPDPS '00 Proceedings of the 15 IPDPS 2000 Workshops on Parallel and Distributed Processing
Register allocation & spilling via graph coloring
SIGPLAN '82 Proceedings of the 1982 SIGPLAN symposium on Compiler construction
Data Relation Vectors: A New Abstraction for Data Optimizations
PACT '00 Proceedings of the 2000 International Conference on Parallel Architectures and Compilation Techniques
Energy-Efficiency of VLSI Caches: A Comparative Study
VLSID '97 Proceedings of the Tenth International Conference on VLSI Design: VLSI in Multimedia Applications
Memory optimizations and exploration for embedded systems
Memory optimizations and exploration for embedded systems
Hi-index | 0.00 |
On-chip caches consume a significant fraction of the energy in current microprocessors. As a result, architectural/circuit-level techniques such as block buffering and sub-banking have been proposed and shown to be very effective in reducing the energy consumption of on-chip caches. While there has been some work on evaluating the energy and performance impact of different block buffering schemes, we are not aware of software solutions to take advantage of on-chip cache block buffers.This article presents a compiler-based approach that modifies code and variable layout to take better advantage of block buffering. The proposed technique is aimed at a class of embedded codes that make heavy use of scalar variables. Unlike previous work that uses only storage pattern optimization or only access pattern optimization, we propose an integrated approach that uses both code restructuring (which affects the access sequence) and storage pattern optimization (which determines the storage layout of variables). We use a graph-based formulation of the problem and present a solution for determining suitable variable placements and accompanying access pattern transformations. The proposed technique has been implemented using an experimental compiler and evaluated using a set of complete programs. The experimental results demonstrate that our approach leads to significant energy savings. Based on these results, we conclude that compiler support is complementary to architecture and circuit-based techniques to extract the best energy behavior from a cache subsystem that employs block buffering.