L1 data cache decomposition for energy efficiency
ISLPED '01 Proceedings of the 2001 international symposium on Low power electronics and design
Three extensions to register integration
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Power-efficient prefetching via bit-differential offset assignment on embedded processors
Proceedings of the 2004 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Hardware-managed register allocation for embedded processors
Proceedings of the 2004 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Differential register allocation
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Computation spreading: employing hardware migration to specialize CMP cores on-the-fly
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Power-efficient prefetching for embedded processors
ACM Transactions on Embedded Computing Systems (TECS)
Allocating architected registers through differential encoding
ACM Transactions on Programming Languages and Systems (TOPLAS)
Exploiting access semantics and program behavior to reduce snoop power in chip multiprocessors
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Specializing Cache Structures for High Performance and Energy Conservation in Embedded Systems
Transactions on High-Performance Embedded Architectures and Compilers I
Data Cache Techniques to Save Power and Deliver High Performance in Embedded Systems
Transactions on High-Performance Embedded Architectures and Compilers II
Stack oriented data cache filtering
CODES+ISSS '09 Proceedings of the 7th IEEE/ACM international conference on Hardware/software codesign and system synthesis
Leveraging high performance data cache techniques to save power in embedded systems
HiPEAC'07 Proceedings of the 2nd international conference on High performance embedded architectures and compilers
Stack filter: Reducing L1 data cache power consumption
Journal of Systems Architecture: the EUROMICRO Journal
L1 data cache power reduction using a forwarding predictor
PATMOS'10 Proceedings of the 20th international conference on Integrated circuit and system design: power and timing modeling, optimization and simulation
Hot-and-Cold: using criticality in the design of energy-efficient caches
PACS'03 Proceedings of the Third international conference on Power - Aware Computer Systems
Reducing L1 caches power by exploiting software semantics
Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design
Towards a performance- and energy-efficient data filter cache
Proceedings of the 10th Workshop on Optimizations for DSP and Embedded Systems
Hi-index | 0.00 |
Abstract: As processor performance increases, there is a corresponding increase in the demands on the memory system, including caches. Research papers have proposed partitioning the cache into instruction/data, temporal/non-temporal, and/or stack/non-stack regions. Each of these designs can improve performance by constructing two separate structures which can be probed in parallel while reducing contention. In this paper, we propose a new memory organization that partitions data references into stack and non-stack regions. Non-stack references are routed to a conventional cache. Stack references, on the other hand, are shown to have several characteristics that can be leveraged to improve performance using a less conventional storage organization. This paper enumerates those characteristics and proposes a new microarchitectural feature, the stack value file (SVF), which exploits them to improve instruction-level parallelism, reduce stack access latencies, reduce demand on the first-level cache, and reduce data bus traffic. Our results show that the SVF can improve execution performance by 29 to 65% while reducing overhead traffic for the stack region by many orders of magnitude over cache structures of the same size.