Streamlining data cache access with fast address calculation
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Reducing set-associative cache energy via way-prediction and selective direct-mapping
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Partitioned instruction cache architecture for energy efficiency
ACM Transactions on Embedded Computing Systems (TECS)
Just Say No: Benefits of Early Cache Miss Determination
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Design of a Predictive Filter Cache for Energy Savings in High Performance Processor Architectures
ICCD '01 Proceedings of the International Conference on Computer Design: VLSI in Computers & Processors
XTREM: a power simulator for the Intel XScale® core
Proceedings of the 2004 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Location cache: a low-power L2 cache system
Proceedings of the 2004 international symposium on Low power electronics and design
Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
MiBench: A free, commercially representative embedded benchmark suite
WWC '01 Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop
Proceedings of the 20th symposium on Great lakes symposium on VLSI
Hi-index | 0.00 |
We propose a 2-level data cache architecture with a low energy-delay product tailored for the embedded systems. The L1 data cache is small and direct-mapped, and employs a write-through policy. In contrast, the L2 data cache is set-associative and adopts a write-back policy. Consequently, the L1 data cache is accessed fast and is able to provide high cache bandwidth while the L2 data cache is effective in reducing global miss rate. To reduce the penalty of high miss rates caused by the small L1 data cache, we propose an ECP (Early Cache hit Predictor) scheme. The ECP predicts if the L1 cache has the requested data using both partial address generation and L1 cache hit prediction. If so, the L2 data cache is directly accessed. To reduce high energy cost of accessing the L2 data cache due to heavy write-through traffic between the two cache levels, we propose a one-way write scheme. From our simulation-based experiments, the proposed 2-level data cache architecture shows average 3.6% and 50% improvements in overall system performance and energy consumption of the data cache and address generation, respectively.