Dual-Core Execution: Building a Highly Scalable Single-Thread Instruction Window
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Cherry-MP: Correctly Integrating Checkpointed Early Resource Recycling in Chip Multiprocessors
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
ReSlice: Selective Re-Execution of Long-Retired Misspeculated Instructions Using Forward Slicing
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
IEEE Transactions on Computers
Future execution: A prefetching mechanism that uses multiple cores to speed up single threads
ACM Transactions on Architecture and Code Optimization (TACO)
A performance-correctness explicitly-decoupled architecture
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the 36th annual international symposium on Computer architecture
Hi-index | 0.00 |
Load misses in on-chip L2 caches often end up stalling modern superscalars. To address this problem, we propose hiding L2 misses with Checkpoint-Assisted VAlue prediction (CAVA). When a load misses in L2, a predicted value is returned to the processor. If the missing load reaches the head of the reorder buffer before the requested data is received from memory, the processor checkpoints, consumes the predicted value, and speculatively continues execution. When the requested data finally arrives, it is compared to the predicted value. If the prediction was correct, execution continues normally; otherwise, execution rolls back to the checkpoint. Compared to a baseline aggressive superscalar, CAVA speeds up execution by a geometric mean of 1.14 for SPECint and 1.34 for SPECfp applications. Additionally, CAVA is faster than an implementation of Runahead execution, and Runahead with value prediction.