Checkpoint repair for out-of-order execution machines
ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
Implementing Precise Interrupts in Pipelined Processors
IEEE Transactions on Computers
Improving data cache performance by pre-executing instructions under a cache miss
ICS '97 Proceedings of the 11th international conference on Supercomputing
Proceedings of the 24th annual international symposium on Computer architecture
Memory dependence prediction using store sets
Proceedings of the 25th annual international symposium on Computer architecture
Reducing branch misprediction penalties via dynamic control independence detection
ICS '99 Proceedings of the 13th international conference on Supercomputing
Dynamic memory disambiguation in the presence of out-of-order store issuing
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Register integration: a simple and efficient implementation of squash reuse
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
A study of slipstream processors
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Slipstream processors: improving both performance and fault tolerance
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Skipper: a microarchitecture for exploiting control-flow independence
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
A survey of processors with explicit multithreading
ACM Computing Surveys (CSUR)
Automatic Generation of Microarchitecture Simulators
ICCL '98 Proceedings of the 1998 International Conference on Computer Languages
ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Techniques for Efficient Processing in Runahead Execution Engines
Proceedings of the 32nd annual international symposium on Computer Architecture
Reducing Branch Misprediction Penalty via Selective Branch Recovery
HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture
Fast branch misprediction recovery in out-of-order superscalar processors
Proceedings of the 19th annual international conference on Supercomputing
Dual-Core Execution: Building a Highly Scalable Single-Thread Instruction Window
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
ReSlice: Selective Re-Execution of Long-Retired Misspeculated Instructions Using Forward Slicing
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
MinneSPEC: A New SPEC Benchmark Workload for Simulation-Based Computer Architecture Research
IEEE Computer Architecture Letters
On Reusing the Results of Pre-Executed Instructions in a Runahead Execution Processor
IEEE Computer Architecture Letters
Hi-index | 0.00 |
We show that a multi-threaded processor that is aware of the processor state in a fine-grain manner can improve single-thread performance significantly by assigning the task of maintaining the correct processor state to an independent thread. We develop fine-grain state maintenance techniques that can be applied in multi-threaded environments and present a fine-grain state application of runahead execution where the data values dependent on a missed load are treated as damaged values. These values are verified and recovered as necessary by an independent thread. We evaluate an SMT-like fine grain state processor and show that it obtains an average of 38.9% and up to 160.0% better performance than coarse-grain baseline processors on the SPEC CFP2000 benchmark suite.