Memory latency effects in decoupled architectures with a single data memory module
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
The case for a single-chip multiprocessor
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Wrong-path instruction prefetching
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Improving data cache performance by pre-executing instructions under a cache miss
ICS '97 Proceedings of the 11th international conference on Supercomputing
Proceedings of the 24th annual international symposium on Computer architecture
Complexity-effective superscalar processors
Proceedings of the 24th annual international symposium on Computer architecture
Register integration: a simple and efficient implementation of squash reuse
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
A study of slipstream processors
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Execution-based prediction using speculative slices
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Speculative precomputation: long-range prefetching of delinquent loads
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Data prefetching by dependence graph precomputation
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Post-pass binary adaptation for software-based speculative precomputation
PLDI '02 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation
Dynamic speculative precomputation
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Automatically characterizing large scale program behavior
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Design and evaluation of compiler algorithms for pre-execution
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Master/slave speculative parallelization
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Using SimPoint for accurate and efficient simulation
SIGMETRICS '03 Proceedings of the 2003 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-Order Processors
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Multiscalar Processors
Speculative Data-Driven Multithreading
HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Checkpoint Processing and Recovery: Towards Scalable Large Instruction Window Processors
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
A Minimal Dual-Core Speculative Multi-Threading Architecture
ICCD '04 Proceedings of the IEEE International Conference on Computer Design
Dual-Core Execution: Building a Highly Scalable Single-Thread Instruction Window
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Future Execution: A Hardware Prefetching Technique for Chip Multiprocessors
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
SST: Symbolic Subordinate Threading
ICCD '05 Proceedings of the 2005 International Conference on Computer Design
Beating In-Order Stalls with "Flea-Flicker" Two-Pass Pipelining
IEEE Transactions on Computers
On Reusing the Results of Pre-Executed Instructions in a Runahead Execution Processor
IEEE Computer Architecture Letters
A performance-correctness explicitly-decoupled architecture
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Hi-index | 0.00 |
In this paper a new architecture, Speculative-Aware Execution (SAE) is presented that employs speculative-awareness as a means of mitigating the drawbacks of speculative execution which are: useless work (uses speculative values so it produces incorrect results or is done on the wrong path) and redundant work (produces results previously obtained). In order to achieve this, SAE tries to partition the dynamic instruction stream into two disjoint parallel threads: A speculative thread that is partially speculative-aware (p-thread) as it records its speculative state and uses it to avoid useless work (using speculative values) but have no account for its control-flow violations; and a fully speculative-aware thread (f-thread) that has full record of p-thread's speculations, and so can steer p-thread away from incorrect control-flow paths and can accurately identify p-thread's correct work and avoid it, otherwise it would be redundant. By eliminating useless and redundant works, SAE outperforms existing architectures that share similar high-level micro-architecture while incurring only minor hardware additions/changes. Detailed experimental results confirm that SAE indeed reduces the number of useless and redundant computations. We also report an average performance improvement of 18% for the SPEC_INT2000 benchmarks.