Value locality and load value prediction
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Dynamic speculation and synchronization of data dependences
Proceedings of the 24th annual international symposium on Computer architecture
Memory dependence prediction using store sets
Proceedings of the 25th annual international symposium on Computer architecture
Speculation techniques for improving load related instruction scheduling
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Basic Block Distribution Analysis to Find Periodic Behavior and Simulation Points in Applications
Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
A Study of Control Independence in Superscalar Processors
HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
Improving Value Communication for Thread-Level Speculation
HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Scalable selective re-execution for EDGE architectures
ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Understanding Scheduling Replay Schemes
HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture
ReSlice: Selective Re-Execution of Long-Retired Misspeculated Instructions Using Forward Slicing
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Late-binding: enabling unordered load-store queues
Proceedings of the 34th annual international symposium on Computer architecture
Ginger: control independence using tag rewriting
Proceedings of the 34th annual international symposium on Computer architecture
Transparent control independence (TCI)
Proceedings of the 34th annual international symposium on Computer architecture
Composable Lightweight Processors
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Counting Dependence Predictors
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Extending concurrency of transactional memory programs by using value prediction
Proceedings of the 6th ACM conference on Computing frontiers
Limited early value communication to improve performance of transactional memory
Proceedings of the 23rd international conference on Supercomputing
Proceedings of the 36th annual international symposium on Computer architecture
WiDGET: Wisconsin decoupled grid execution tiles
Proceedings of the 37th annual international symposium on Computer architecture
RETCON: transactional repair without replay
Proceedings of the 37th annual international symposium on Computer architecture
Exploiting criticality to reduce bottlenecks in distributed uniprocessors
HPCA '11 Proceedings of the 2011 IEEE 17th International Symposium on High Performance Computer Architecture
Hi-index | 0.00 |
Data speculation technique has been heavily exploited in various scenarios of architecture design. It bridges the time or space gap between data producer and data consumer, which gives opportunities to processors to gain significant speedups. However, large instruction windows, deep pipeline and increasing latency of on-chip communication make data misspeculation very expensive in modern processors. This paper proposes a Distributed Replay Protocol(DRP) that addresses data misspeculation in a distributed uniprocessor, named TFlex. The partition feature of distributed uniprocessors aggravates the penalty of data misspeculation. After detecting misspeculation, DRP avoids squashing pipeline; on the contrary, it retains all instructions in the window and selectively replays the instructions that depend on the misspeculative data. As one possible use of DRP, We apply it to recovery from data dependence speculation. We also summarize the challenges of implementing selective replay mechanism on distributed uniprocessors, and then come up with two variations of DRP to effectively solve these challenges. The evaluation results show that without data speculation, DRP achieves 99% of the performance of perfect memory disambiguation. It speeds up diverse applications over baseline TFlex(with a state-of-art data dependence predictor) by a geometric mean of 24%.