Reducing the frequency of tag compares for low power I-cache design
ISLPED '95 Proceedings of the 1995 international symposium on Low power design
Cache design trade-offs for power and performance optimization: a case study
ISLPED '95 Proceedings of the 1995 international symposium on Low power design
Trace cache: a low latency approach to high bandwidth instruction fetching
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
The filter cache: an energy efficient memory structure
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
ISLPED '99 Proceedings of the 1999 international symposium on Low power electronics and design
Architectural and compiler techniques for energy reduction in high-performance microprocessors
IEEE Transactions on Very Large Scale Integration (VLSI) Systems - Special section on low-power electronics and design
A low power unified cache architecture providing power and performance flexibility (poster session)
ISLPED '00 Proceedings of the 2000 international symposium on Low power electronics and design
Using dynamic cache management techniques to reduce energy in general purpose processors
IEEE Transactions on Very Large Scale Integration (VLSI) Systems - Special issue on system-level interconnect prediction
Power-aware partitioned cache architectures
ISLPED '01 Proceedings of the 2001 international symposium on Low power electronics and design
SH3: High Code Density, Low Power
IEEE Micro
Scratchpad memory: design alternative for cache on-chip memory in embedded systems
Proceedings of the tenth international symposium on Hardware/software codesign
PEAS-III: An ASIP Design Environment
ICCD '00 Proceedings of the 2000 IEEE International Conference on Computer Design: VLSI in Computers & Processors
Assigning Program and Data Objects to Scratchpad for Energy Reduction
Proceedings of the conference on Design, automation and test in Europe
Polynomial-time algorithm for on-chip scratchpad memory partitioning
Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
Partial Tag Comparison: A New Technology for Power-Efficient Set-Associative Cache Designs
VLSID '04 Proceedings of the 17th International Conference on VLSI Design
HotSpot cache: joint temporal and spatial locality exploitation for i-cache energy reduction
Proceedings of the 2004 international symposium on Low power electronics and design
A way-halting cache for low-energy high-performance systems
ACM Transactions on Architecture and Code Optimization (TACO)
MiBench: A free, commercially representative embedded benchmark suite
WWC '01 Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop
Balanced Cache: Reducing Conflict Misses of Direct-Mapped Caches
Proceedings of the 33rd annual international symposium on Computer Architecture
Dynamic allocation for scratch-pad memory using compile-time decisions
ACM Transactions on Embedded Computing Systems (TECS)
A dynamic code placement technique for scratchpad memory using postpass optimization
CASES '06 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems
Reducing energy in instruction caches by using multiple line buffers with prediction
ISHPC'05/ALPS'06 Proceedings of the 6th international symposium on high-performance computing and 1st international conference on Advanced low power systems
DLIC: Decoded loop instructions caching for energy-aware embedded processors
ACM Transactions on Embedded Computing Systems (TECS)
Hi-index | 0.00 |
On-chip instruction cache is a potential power hungry component in embedded systems due to its large chip area and high access-frequency. Aiming at reducing power consumption of the on-chip cache, we propose a Reduced One-Bit Tag Instruction Cache (ROBTIC), where the cache size is judiciously reduced and the cache tag field only contains the least significant bit of the full-tag. We develop a cache operational control scheme for ROBTIC so that with the one-bit cache tag, the program locality can still be efficiently exploited. For applications where most of the memory accesses are localized, our cache can achieve similar performance as a traditional full-tag cache; however, the power consumption of the cache can be significantly reduced due to the much smaller cache size, narrower tag array (just one bit), and tinier tag comparison circuit being used. Experiments on a set of benchmarks implemented in CMOS 180nm process technology demonstrate that our proposed design can reduce up to 27.3% dynamic power consumption and 30.9% area of the traditional cache when the cache size is fixed at 32 instructions, which outperforms the existing partial-tag based cache design. With the cache size customization, a further 47.8% power saving can be achieved. Our experimental results also show that when implemented in the deep sub-micron technologies where the leakage power is not ignorable, our design is still efficient - a coherent power saving trend (about 22%) has been observed for technologies from 130nm down to 65nm.