High-bandwidth address translation for multiple-issue processors
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Reducing TLB power requirements
ISLPED '97 Proceedings of the 1997 international symposium on Low power electronics and design
Way-predicting set-associative cache for high performance and low energy consumption
ISLPED '99 Proceedings of the 1999 international symposium on Low power electronics and design
Wattch: a framework for architectural-level power analysis and optimizations
Proceedings of the 27th annual international symposium on Computer architecture
Generating physical addresses directly for saving instruction TLB energy
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Energy efficient D-TLB and data cache using semantic-aware multilateral partitioning
Proceedings of the 2003 international symposium on Low power electronics and design
A selective filter-bank TLB system
Proceedings of the 2003 international symposium on Low power electronics and design
Compiler-directed code restructuring for reducing data TLB energy
Proceedings of the 2nd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
MiBench: A free, commercially representative embedded benchmark suite
WWC '01 Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop
The interval page table: virtual memory support in real-time and memory-constrained embedded systems
Proceedings of the 20th annual conference on Integrated circuits and systems design
Direct address translation for virtual memory in energy-efficient embedded systems
ACM Transactions on Embedded Computing Systems (TECS)
Hi-index | 0.00 |
Modern processors can issue and execute multiple instructions per cycle, often performing multiple memory operations simultaneously. To reduce stalls due to resource conflicts, most processors employ multi-ported L1 caches and TLBs to enable concurrent memory accesses. In this paper, we observe that data TLB lookups within a cycle and across consecutive cycles are often synonymous --- they go to the same page. To exploit this finding, we propose two new mechanisms --- intra-cycle compaction and inter-cycle compaction of address translation requests in order to save energy in the data TLB. Our results show that average energy savings of 27% using intra-cycle, 42% using inter-cycle in a conventional d-TLB, and 56% using inter-cycle compaction in semantic-aware d-TLBs can be achieved. When these 2 compaction techniques are combined together and applied to both the i-TLB and semantic-aware d-TLBs, an average energy savings of 76% (up to 87%) is obtained