Improving single-thread performance with fine-grain state maintenance

Authors:
Peng Zhou;Soner Õnder
Affiliations:
Michigan Technological University, Houghton, MI, USA;Michigan Technological University, Houghton, MI, USA
Venue:
Proceedings of the 5th conference on Computing frontiers
Year:
2008

Citing 22
Cited 0

Checkpoint repair for out-of-order execution machines

ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
Implementing Precise Interrupts in Pipelined Processors

IEEE Transactions on Computers
Improving data cache performance by pre-executing instructions under a cache miss

ICS '97 Proceedings of the 11th international conference on Supercomputing
Dynamic instruction reuse

Proceedings of the 24th annual international symposium on Computer architecture
Memory dependence prediction using store sets

Proceedings of the 25th annual international symposium on Computer architecture
Reducing branch misprediction penalties via dynamic control independence detection

ICS '99 Proceedings of the 13th international conference on Supercomputing
Dynamic memory disambiguation in the presence of out-of-order store issuing

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Register integration: a simple and efficient implementation of squash reuse

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
A study of slipstream processors

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Slipstream processors: improving both performance and fault tolerance

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Skipper: a microarchitecture for exploiting control-flow independence

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
A survey of processors with explicit multithreading

ACM Computing Surveys (CSUR)
Automatic Generation of Microarchitecture Simulators

ICCL '98 Proceedings of the 1998 International Conference on Computer Languages
Continual flow pipelines

ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Techniques for Efficient Processing in Runahead Execution Engines

Proceedings of the 32nd annual international symposium on Computer Architecture
Reducing Branch Misprediction Penalty via Selective Branch Recovery

HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture
Fast branch misprediction recovery in out-of-order superscalar processors

Proceedings of the 19th annual international conference on Supercomputing
Dual-Core Execution: Building a Highly Scalable Single-Thread Instruction Window

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
ReSlice: Selective Re-Execution of Long-Retired Misspeculated Instructions Using Forward Slicing

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
MinneSPEC: A New SPEC Benchmark Workload for Simulation-Based Computer Architecture Research

IEEE Computer Architecture Letters
On Reusing the Results of Pre-Executed Instructions in a Runahead Execution Processor

IEEE Computer Architecture Letters
Runahead Execution: An Effective Alternative to Large Instruction Windows

IEEE Micro

Quantified Score

Hi-index	0.00

Visualization

Abstract

We show that a multi-threaded processor that is aware of the processor state in a fine-grain manner can improve single-thread performance significantly by assigning the task of maintaining the correct processor state to an independent thread. We develop fine-grain state maintenance techniques that can be applied in multi-threaded environments and present a fine-grain state application of runahead execution where the data values dependent on a missed load are treated as damaged values. These values are verified and recovered as necessary by an independent thread. We evaluate an SMT-like fine grain state processor and show that it obtains an average of 38.9% and up to 160.0% better performance than coarse-grain baseline processors on the SPEC CFP2000 benchmark suite.