Strategies for cache and local memory management by global program transformation
Journal of Parallel and Distributed Computing - Special Issue on Languages, Compilers and environments for Parallel Programming
Shade: a fast instruction-set simulator for execution profiling
SIGMETRICS '94 Proceedings of the 1994 ACM SIGMETRICS conference on Measurement and modeling of computer systems
SIGMETRICS '94 Proceedings of the 1994 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Internal organization of the Alpha 21164, a 300-MHz 64-bit quad-issue CMOS RISC microprocessor
Digital Technical Journal - Special 10th anniversary issue
Cache design trade-offs for power and performance optimization: a case study
ISLPED '95 Proceedings of the 1995 international symposium on Low power design
A quantitative analysis of loop nest locality
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Adapting cache line size to application behavior
ICS '99 Proceedings of the 13th international conference on Supercomputing
ISLPED '99 Proceedings of the 1999 international symposium on Low power electronics and design
Automatic and efficient evaluation of memory hierarchies for embedded systems
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Selective cache ways: on-demand cache resource allocation
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
IEEE Transactions on Computers
Energy-driven integrated hardware-software optimizations using SimplePower
Proceedings of the 27th annual international symposium on Computer architecture
Reconfigurable caches and their application to media processing
Proceedings of the 27th annual international symposium on Computer architecture
Gated-Vdd: a circuit technique to reduce leakage in deep-submicron cache memories
ISLPED '00 Proceedings of the 2000 international symposium on Low power electronics and design
Memory system energy (poster session): influence of hardware-software optimizations
ISLPED '00 Proceedings of the 2000 international symposium on Low power electronics and design
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Custom Memory Management Methodology: Exploration of Memory Organisation for Embedded Multimedia System Design
Design of High-Performance Microprocessor Circuits
Design of High-Performance Microprocessor Circuits
High Performance Compilers for Parallel Computing
High Performance Compilers for Parallel Computing
Architectural adaptation for application-specific locality optimizations
ICCD '97 Proceedings of the 1997 International Conference on Computer Design (ICCD '97)
Hardware techniques to improve the performance of the processor/memory interface
Hardware techniques to improve the performance of the processor/memory interface
Compiler-directed cache polymorphism
Proceedings of the joint conference on Languages, compilers and tools for embedded systems: software and compilers for embedded systems
Analyzing data reuse for cache reconfiguration
ACM Transactions on Embedded Computing Systems (TECS)
Hi-index | 0.00 |
Computer architects have tried to mitigate the consequences of high memory latencies using a variety techniques. An example of these techniques is multi-level caches to counteract the latency that results from having a memory that is slower than the processor. Recent research has demonstrated that compiler optimizations that modify data layouts and restructure computation can be successful in improving memory system performance. However, in many cases, working with a fixed cache configuration prevents the application/compiler from obtaining the maximum performance. In addition, prompted by demands in portability, long battery life, and low-cost packaging, the computer industry has started viewing energy and power as decisive design factors, along with performance and cost. This makes the job of the compiler/user even more difficult as one needs to strike a balance between low power/energy consumption and high performance. Consequently, adapting the code to the underlying cache/memory hierarchy is becoming more and more difficult.In this paper, we take an alternate approach and attempt to adapt the cache architecture to the software needs. We focus on array-dominated applications and measure the potential benefits that could be gained from a morphable (reconfigurable) cache architecture. Our results show that not only different applications work best with different cache configurations, but also that different loop nests in a given application demand different configurations. Our results also indicate that the most suitable cache configuration for a given application or a single nest depends strongly on the objective function being optimized. For example, minimizing cache memory energy requires a different cache configuration for each nest than an objective which tries to minimize the overall memory system energy. Based on our experiments, we conclude that fine-grain (loop nest-level) cache configuration management is an important step for a solution to the challenging architecture/software tradeoffs awaiting system designers in the future.