ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Value locality and load value prediction
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Dynamic speculation and synchronization of data dependences
Proceedings of the 24th annual international symposium on Computer architecture
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
The predictability of data values
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Memory dependence prediction using store sets
Proceedings of the 25th annual international symposium on Computer architecture
Data speculation support for a chip multiprocessor
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
A Chip-Multiprocessor Architecture with Speculative Multithreading
IEEE Transactions on Computers
Wattch: a framework for architectural-level power analysis and optimizations
Proceedings of the 27th annual international symposium on Computer architecture
Slice-processors: an implementation of operation-based prediction
ICS '01 Proceedings of the 15th international conference on Supercomputing
Difficult-path branch prediction using subordinate microthreads
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Speculative lock elision: enabling highly concurrent multithreaded execution
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Speculative synchronization: applying thread-level speculation to explicitly parallel applications
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Cherry: checkpointed early resource recycling in out-of-order microprocessors
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Enhancing memory level parallelism via recovery-free value prediction
ICS '03 Proceedings of the 17th annual international conference on Supercomputing
ICSE '81 Proceedings of the 5th international conference on Software engineering
The Potential for Using Thread-Level Data Speculation to Facilitate Automatic Parallelization
HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-Order Processors
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Improving Value Communication for Thread-Level Speculation
HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Checkpoint Processing and Recovery: Towards Scalable Large Instruction Window Processors
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Cost effective dynamic program slicing
Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
Coherence decoupling: making use of incoherence
ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Scalable selective re-execution for EDGE architectures
ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Checkpointed Early Load Retirement
HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Out-of-Order Commit Processors
HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture
Understanding Scheduling Replay Schemes
HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture
Tasking with out-of-order spawn in TLS chip multiprocessors: microarchitecture and compilation
Proceedings of the 19th annual international conference on Supercomputing
CAVA: Hiding L2 Misses with Checkpoint-Assisted Value Prediction
IEEE Computer Architecture Letters
Tolerating Dependences Between Large Speculative Threads Via Sub-Threads
Proceedings of the 33rd annual international symposium on Computer Architecture
Adaptive reorder buffers for SMT processors
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
SlicK: slice-based locality exploitation for efficient redundant multithreading
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Transparent control independence (TCI)
Proceedings of the 34th annual international symposium on Computer architecture
An L2-miss-driven early register deallocation for SMT processors
Proceedings of the 21st annual international conference on Supercomputing
Hiding the misprediction penalty of a resource-efficient high-performance processor
ACM Transactions on Architecture and Code Optimization (TACO)
Improving single-thread performance with fine-grain state maintenance
Proceedings of the 5th conference on Computing frontiers
Reducing register pressure in SMT processors through L2-miss-driven early register release
ACM Transactions on Architecture and Code Optimization (TACO)
Reexecution and Selective Reuse in Checkpoint Processors
Transactions on High-Performance Embedded Architectures and Compilers II
Forwardflow: a scalable core for power-constrained CMPs
Proceedings of the 37th annual international symposium on Computer architecture
RETCON: transactional repair without replay
Proceedings of the 37th annual international symposium on Computer architecture
Distributed replay protocol for distributed uniprocessors
Proceedings of the 26th ACM international conference on Supercomputing
Speculative parallelization: eliminating the overhead of failure
HPCC'07 Proceedings of the Third international conference on High Performance Computing and Communications
ACM Transactions on Architecture and Code Optimization (TACO)
Hi-index | 0.00 |
As more data value speculation mechanisms are being proposed to speed-up processors, there is growing pressure on the critical processor structures that must buffer the state of the speculative instructions. A scalable solution is to checkpoint the processor and retire speculative instructions. However, in this environment, misprediction recovery becomes very wasteful, as it involves discarding and re-executing all the instructions executed since the checkpoint. To speed-up execution in this environment, this paper presents a novel architecture (ReSlice) that selectively re-executes only the speculatively-retired instructions that directly depended on the mispredicted value, namely its Forward Slice. ReSlice buffers the (typically very few) instructions in the forward slice of the predicted value as such instructions initially execute. Then, potentially thousands of instructions later, ReSlice can quickly re-execute the slice if a misprediction is declared, and merge its state with the program state. In addition, this paper develops a sufficient condition for correct slice re-execution and merge. As one possible use of ReSlice, we apply it to recover from cross-task dependence violations in a chip multiprocessor with Thread-Level Speculation (TLS). ReSlice speeds up SpecInt applications over aggressive TLS by up to 33%, with a geometric mean of 12%. Moreover, E 脳 D2 decreases by 20%. All this is obtained by saving on average 61% of the task squashes through slice re-execution. On average, a slice re-executes only 6.6 instructions, compared to the 210 that would be re-executed on a squash.