MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
TLB and snoop energy-reduction using virtual caches in low-power chip-multiprocessors
Proceedings of the 2002 international symposium on Low power electronics and design
U-cache: a cost-effective solution to synonym problem
HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Towards Virtually-Addressed Memory Hierarchies
HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
A Banked-Promotion TLB for High Performance and Low Power
ICCD '01 Proceedings of the International Conference on Computer Design: VLSI in Computers & Processors
Energy-effcient physically tagged caches for embedded processors with virtual memory
Proceedings of the 42nd annual Design Automation Conference
Compiler-Directed Code Restructuring for Reducing Data TLB Energy
CODES+ISSS '04 Proceedings of the international conference on Hardware/Software Codesign and System Synthesis: 2004
Hi-index | 0.00 |
In this paper we present a novel cache architecture for energy-efficient data caches in embedded processors with virtual memory. Application knowledge regarding the nature of memory references is used to eliminate tag address translations for most of the cache accesses. We introduce a novel cache tagging scheme, where both virtual and physical tags co-exist in the cache tag arrays. Physical tags and special handling for the super-set cache index bits are used for references to shared data regions in order to avoid cache consistency problems. By eliminating the need for address translation on cache access for the majority of references, a significant power reduction is achieved. We outline an efficient hardware architecture for the proposed approach, where the application information is captured in a reprogrammable way and the cache architecture is minimally modified. Our experimental results show energy reductions for the address translation hardware in the range of 90%, while the reduction for the entire cache architecture is within the range of 25%-30%.