Better exploration of region-level value locality with integrated computation reuse and value prediction

Authors:
Youfeng Wu;Dong-Yuan Chen;Jesse Fang
Affiliations:
Microprocessor Research Labs (MRL), Intel Corporation, Santa Clara, CA;Microprocessor Research Labs (MRL), Intel Corporation, Santa Clara, CA;Microprocessor Research Labs (MRL), Intel Corporation, Santa Clara, CA
Venue:
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Year:
2001

Citing 28
Cited 4

Performance tradeoffs in multistreamed superscalar architectures

Performance tradeoffs in multistreamed superscalar architectures
Simultaneous multithreading: maximizing on-chip parallelism

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Multiscalar processors

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Value locality and load value prediction

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Exceeding the dataflow limit via value prediction

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Dynamic instruction reuse

Proceedings of the 24th annual international symposium on Computer architecture
The predictability of data values

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Value profiling

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Highly accurate data value prediction using hybrid predictors

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Using value prediction to increase the power of speculative execution hardware

ACM Transactions on Computer Systems (TOCS)
Understanding the differences between value prediction and instruction reuse

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
A dynamic multithreading processor

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
An empirical analysis of instruction repetition

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Value speculation scheduling for high performance processors

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Value prediction in VLIW machines

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Storageless value prediction using prior register values

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
A Chip-Multiprocessor Architecture with Speculative Multithreading

IEEE Transactions on Computers
Compiler-directed dynamic computation reuse: rationale and initial results

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
DIVA: a reliable substrate for deep submicron microarchitecture design

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Value prediction for speculative multithreaded architectures

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Extending Value Reuse to Basic Blocks with Compiler Support

IEEE Transactions on Computers
On the value locality of store instructions

Proceedings of the 27th annual international symposium on Computer architecture
Efficient checker processor design

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
A study of slipstream processors

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Hardware support for dynamic activation of compiler-directed computation reuse

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Slipstream processors: improving both performance and fault tolerance

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Tolerating memory latency through software-controlled pre-execution in simultaneous multithreading processors

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Exploring Sub-Block Value Reuse for Superscalar Processors

PACT '00 Proceedings of the 2000 International Conference on Parallel Architectures and Compilation Techniques

Detecting global stride locality in value streams

Proceedings of the 30th annual international symposium on Computer architecture
Design and evaluation of an auto-memoization processor

PDCN'07 Proceedings of the 25th conference on Proceedings of the 25th IASTED International Multi-Conference: parallel and distributed computing and networks
SoftSig: software-exposed hardware signatures for code analysis and optimization

Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Limits for a feasible speculative trace reuse implementation

International Journal of High Performance Systems Architecture

Quantified Score

Hi-index	0.00

Visualization

Abstract

Computation-reuse and value-prediction are two recent techniques for improving microprocessor performance by exploiting value localities. They both aim at breaking the data dependence limit in traditional processors. In this paper, we propose a speculative multithreading scheme in which the same hardware can be efficiently used for both computation reuse and value prediction. For the SpecInt95 benchmarks, our experiment shows that the integrated approach significantly out-performs either computation reuse or value prediction alone. For example, the integrated approach improves over computation reuse from a speedup of 1.25 to 1.40, and improves over value prediction from 1.28 to 1.40. In particular, the integrated approach out-performs a computation reuse configuration that has twice as much reuse buffer entries (from a speedup 1.33 to 1.40). Furthermore, unlike the computation reuse approach, the performance of the integrated approach does not rely on value profile during region formation and thus our approach is more suitable for production systems.