Stack Value File: Custom Microarchitecture for the Stack

Authors:
Hsien-Hsin S. Lee;Mikhail Smelyanskiy;Gary S. Tyson;Chris J. Newburn
Affiliations:
-;-;-;-
Venue:
HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Year:
2001

Citing 0
Cited 19

L1 data cache decomposition for energy efficiency

ISLPED '01 Proceedings of the 2001 international symposium on Low power electronics and design
Three extensions to register integration

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Power-efficient prefetching via bit-differential offset assignment on embedded processors

Proceedings of the 2004 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Hardware-managed register allocation for embedded processors

Proceedings of the 2004 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Differential register allocation

Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
How to Fake 1000 Registers

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Computation spreading: employing hardware migration to specialize CMP cores on-the-fly

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Power-efficient prefetching for embedded processors

ACM Transactions on Embedded Computing Systems (TECS)
Allocating architected registers through differential encoding

ACM Transactions on Programming Languages and Systems (TOPLAS)
Exploiting access semantics and program behavior to reduce snoop power in chip multiprocessors

Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Specializing Cache Structures for High Performance and Energy Conservation in Embedded Systems

Transactions on High-Performance Embedded Architectures and Compilers I
Data Cache Techniques to Save Power and Deliver High Performance in Embedded Systems

Transactions on High-Performance Embedded Architectures and Compilers II
Stack oriented data cache filtering

CODES+ISSS '09 Proceedings of the 7th IEEE/ACM international conference on Hardware/software codesign and system synthesis
Leveraging high performance data cache techniques to save power in embedded systems

HiPEAC'07 Proceedings of the 2nd international conference on High performance embedded architectures and compilers
Stack filter: Reducing L1 data cache power consumption

Journal of Systems Architecture: the EUROMICRO Journal
L1 data cache power reduction using a forwarding predictor

PATMOS'10 Proceedings of the 20th international conference on Integrated circuit and system design: power and timing modeling, optimization and simulation
Hot-and-Cold: using criticality in the design of energy-efficient caches

PACS'03 Proceedings of the Third international conference on Power - Aware Computer Systems
Reducing L1 caches power by exploiting software semantics

Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design
Towards a performance- and energy-efficient data filter cache

Proceedings of the 10th Workshop on Optimizations for DSP and Embedded Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Abstract: As processor performance increases, there is a corresponding increase in the demands on the memory system, including caches. Research papers have proposed partitioning the cache into instruction/data, temporal/non-temporal, and/or stack/non-stack regions. Each of these designs can improve performance by constructing two separate structures which can be probed in parallel while reducing contention. In this paper, we propose a new memory organization that partitions data references into stack and non-stack regions. Non-stack references are routed to a conventional cache. Stack references, on the other hand, are shown to have several characteristics that can be leveraged to improve performance using a less conventional storage organization. This paper enumerates those characteristics and proposes a new microarchitectural feature, the stack value file (SVF), which exploits them to improve instruction-level parallelism, reduce stack access latencies, reduce demand on the first-level cache, and reduce data bus traffic. Our results show that the SVF can improve execution performance by 29 to 65% while reducing overhead traffic for the stack region by many orders of magnitude over cache structures of the same size.