Software pipelining: an effective scheduling technique for VLIW machines
PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
The cache performance and optimizations of blocked algorithms
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
A data locality optimizing algorithm
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Tiling multidimensional iteration spaces for nonshared memory machines
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Design and evaluation of a compiler algorithm for prefetching
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
VLIW compilation techniques in a superscalar environment
PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Compiling for numa parallel machines
Compiling for numa parallel machines
Supporting dynamic data structures on distributed-memory machines
ACM Transactions on Programming Languages and Systems (TOPLAS)
Compiler-based prefetching for recursive data structures
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Cache miss equations: an analytical representation of cache misses
ICS '97 Proceedings of the 11th international conference on Supercomputing
Parallelizing nonnumerical code with selective scheduling and software pipelining
ACM Transactions on Programming Languages and Systems (TOPLAS)
Data transformations for eliminating conflict misses
PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Precise miss analysis for program transformations with caches of arbitrary associativity
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
System-level power optimization: techniques and tools
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Wattch: a framework for architectural-level power analysis and optimizations
Proceedings of the 27th annual international symposium on Computer architecture
Gated-Vdd: a circuit technique to reduce leakage in deep-submicron cache memories
ISLPED '00 Proceedings of the 2000 international symposium on Low power electronics and design
A compiler technique for improving whole-program locality
POPL '01 Proceedings of the 28th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Reducing leakage in a high-performance deep-submicron instruction cache
IEEE Transactions on Very Large Scale Integration (VLSI) Systems - Special issue on low power electronics and design
Cache decay: exploiting generational behavior to reduce cache leakage power
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
DRG-cache: a data retention gated-ground cache for low power
Proceedings of the 39th annual Design Automation Conference
Drowsy caches: simple techniques for reducing leakage power
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Design of High-Performance Microprocessor Circuits
Design of High-Performance Microprocessor Circuits
Low-leakage asymmetric-cell SRAM
Proceedings of the 2002 international symposium on Low power electronics and design
Adaptive Mode Control: A Static-Power-Efficient Cache Design
Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
Leakage Energy Management in Cache Hierarchies
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Compiler-directed instruction cache leakage optimization
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Integrating Loop and Data Transformations for Global Optimisation
PACT '98 Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques
Static Energy Reduction Techniques for Microprocessor Caches
ICCD '01 Proceedings of the International Conference on Computer Design: VLSI in Computers & Processors
Instruction cache leakage reduction by changing register operands and using asymmetric sram cells
Proceedings of the 18th ACM Great Lakes symposium on VLSI
Reducing leakage power with BTB access prediction
Integration, the VLSI Journal
An utilization driven framework for energy efficient caches
HiPC'08 Proceedings of the 15th international conference on High performance computing
An ESL approach for energy consumption analysis of cache memories in SoC platforms
International Journal of Reconfigurable Computing - Special issue on selected papers from the southern programmable logic conference (SPL2010)
Transactions on high-performance embedded architectures and compilers III
Hi-index | 0.00 |
Silicon technology advances have made it possible to pack millions of transistors---switching at high clock speeds---on a single chip. While these advances bring unprecedented performance to electronic products, they also pose difficult power/energy consumption problems. For example, large number of transistors in dense on-chip cache memories consume significant static (leakage) power even if the cache is not used by the current computation. While previous compiler research studied code and data restructuring for improving data cache performance, to our knowledge, there exists no compiler-based study that targets data cache leakage power consumption. In this paper, we present code restructuring techniques for array-based and pointer-intensive applications for reducing data cache leakage energy consumption. The idea is to let the compiler analyze the application code and insert instructions that turn off cache lines that keep variables not used by the current computation. This turning-off does not destroy contents of a cache line and waking up the cache line (when it is accessed later) does not incur much overhead. Due to inherent data locality in applications, we find that, at a given time, only a small portion of the data cache needs to be active; the remaining part can be placed into a leakage-saving mode (state); i.e., they can be turned off. Our experimental results indicate that the proposed compiler-based strategy reduces the cache energy consumption significantly. We also demonstrate how different compiler optimizations can increase the effectiveness of our strategy.