Three extensions to register integration

Authors:
Vlad Petric;Anne Bracy;Amir Roth
Affiliations:
University of Pennsylvania;University of Pennsylvania;University of Pennsylvania
Venue:
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Year:
2002

Citing 13
Cited 5

Dynamic instruction reuse

Proceedings of the 24th annual international symposium on Computer architecture
Improving the accuracy and performance of memory communication through renaming

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Streamlining inter-operation memory communication via data dependence prediction

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Accelerating multi-media processing by implementing memoing in multiplication and division units

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Reducing branch misprediction penalties via dynamic control independence detection

ICS '99 Proceedings of the 13th international conference on Supercomputing
Dynamic removal of redundant computations

ICS '99 Proceedings of the 13th international conference on Supercomputing
DIVA: a reliable substrate for deep submicron microarchitecture design

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Register integration: a simple and efficient implementation of squash reuse

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
The MIPS R10000 Superscalar Microprocessor

IEEE Micro
The Alpha 21264 Microprocessor

IEEE Micro
Stack Value File: Custom Microarchitecture for the Stack

HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Speculative Data-Driven Multithreading

HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Dynamic instruction reuse

Dynamic instruction reuse

Continuous Optimization

Proceedings of the 32nd annual international symposium on Computer Architecture
RENO: A Rename-Based Instruction Optimizer

Proceedings of the 32nd annual international symposium on Computer Architecture
Store Vulnerability Window (SVW): Re-Execution Filtering for Enhanced Load Optimization

Proceedings of the 32nd annual international symposium on Computer Architecture
Scalable Store-Load Forwarding via Store Queue Index Prediction

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
NoSQ: Store-Load Communication without a Store Queue

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture

Quantified Score

Hi-index	0.00

Visualization

Abstract

Register integration (or just integration) is a register renaming discipline that implements instruction reuse via physical register sharing. Initially developed to perform squash reuse, the integration mechanism can exploit more reuse scenarios. Here, we describe three extensions to the original design that expand its applicability and boost its performance impact. First, we extend squash reuse to general reuse. Whereas squash reuse maintains the concept of an instruction instance "owning" its output register, we allow multiple instructions to simultaneously share a single register. Next, we replace the PC indexing scheme with an opcode-based indexing scheme that exposes more integration opportunities. Finally, we introduce an extension called reverse integration in which we speculatively create integration entries for the inverses of operations---for instance, when renaming an add, we create an entry for the inverse subtract. Reverse integration allows us to reuse operations that the program itself has not executed yet. We use reverse integration to implement speculative memory bypassing for stack-pointer based loads (register fills and restores).Our evaluation shows that these extensions increase the integration rate---the number of retired instructions that integrate older results and bypass the execution engine---to an average of 15% on the SPEC2000 integer benchmarks. On a 4-way superscalar processor with an aggressive memory system, this translates into an average IPC improvement of 7%. The fact that integrating instructions completely bypass the execution engine raises the possibility of using integration as a low-complexity substitute for execution bandwidth and issue buffering. Our experiments show that such a trade-off is possible, enabling a range of IPC/complexity designs.