Crafting a compiler with C
Architectural support for reduced register saving/restoring in single-window register files
ACM Transactions on Computer Systems (TOCS)
The effect of context switches on cache performance
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
The interaction of architecture and operating system design
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Exploiting short-lived variables in superscalar processors
Proceedings of the 28th annual international symposium on Microarchitecture
Whole-program optimization for time and space efficient threads
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Minimum cost interprocedural register allocation
POPL '96 Proceedings of the 23rd ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Designing high bandwidth on-chip caches
Proceedings of the 24th annual international symposium on Computer architecture
Register File Design Considerations in Dynamically Scheduled Processors
HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Effective jump-pointer prefetching for linked data structures
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Storageless value prediction using prior register values
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Software-Directed Register Deallocation for Simultaneous Multithreaded Processors
IEEE Transactions on Parallel and Distributed Systems
Slipstream processors: improving both performance and fault tolerance
ACM SIGPLAN Notices
A study of slipstream processors
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Exploiting data forwarding to reduce the power budget of VLIW embedded processors
Proceedings of the conference on Design, automation and test in Europe
Integrating superscalar processor components to implement register caching
ICS '01 Proceedings of the 15th international conference on Supercomputing
Slipstream processors: improving both performance and fault tolerance
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Dynamic dead-instruction detection and elimination
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Cherry: checkpointed early resource recycling in out-of-order microprocessors
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Optimization for the Intel® Itanium® architecture register stack
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Proceedings of the 31st annual international symposium on Computer architecture
Late Allocation and Early Release of Physical Registers
IEEE Transactions on Computers
Register Packing: Exploiting Narrow-Width Operands for Reducing Register File Pressure
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Compiler Directed Early Register Release
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Selective writeback: exploiting transient values for energy-efficiency and performance
Proceedings of the 2006 international symposium on Low power electronics and design
Compacting register file via 2-level renaming and bit-partitioning
Microprocessors & Microsystems
IEEE Transactions on Computers
Hardware support for early register release
International Journal of High Performance Computing and Networking
International Journal of High Performance Computing and Networking
Achieving Out-of-Order Performance with Almost In-Order Complexity
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Selective writeback: reducing register file pressure and energy consumption
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Exploring the limits of early register release: Exploiting compiler analysis
ACM Transactions on Architecture and Code Optimization (TACO)
Energy-efficient renaming with register versioning
Proceedings of the 14th ACM/IEEE international symposium on Low power electronics and design
Energy-efficient register caching with compiler assistance
ACM Transactions on Architecture and Code Optimization (TACO)
LIRAC: using live range information to optimize memory access
ARCS'07 Proceedings of the 20th international conference on Architecture of computing systems
Saving register-file static power by monitoring instruction sequence in ROB
Journal of Systems Architecture: the EUROMICRO Journal
Live range aware cache architecture
ACSAC'06 Proceedings of the 11th Asia-Pacific conference on Advances in Computer Systems Architecture
Exploring the potential of architecture-level power optimizations
PACS'03 Proceedings of the Third international conference on Power - Aware Computer Systems
ACM Transactions on Embedded Computing Systems (TECS)
Hi-index | 0.01 |
We describe Dead Value Information (DVI) and introduce three new optimizations which exploit it. DVI provides assertions that certain register values are dead, meaning they will not be read before being overwritten. The processor can use DVI to track dead registers and dynamically eliminate unnecessary save and restore instructions from the execution stream at procedure calls and context switches. Our results indicate that dynamic saves and restore instances can be reduced by 46% for procedure calls and by 51% for context switches. In addition, save/restore elimination for procedure calls can improve overall performance by up to 5%. DVI also allows the processor manage physical registers to efficiently, reducing the size requirements of the physical register file. When the system clock rate is proportional to the register file cycle time, this optimization can improve performance. All of these optimizations can be supported with only a few new instructions and minimal additional hardware structures.