Memory latency effects in decoupled architectures with a single data memory module
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Efficient superscalar performance through boosting
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
The case for a single-chip multiprocessor
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Proceedings of the 24th annual international symposium on Computer architecture
Register integration: a simple and efficient implementation of squash reuse
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Slipstream processors: improving both performance and fault tolerance
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Data prefetching by dependence graph precomputation
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Difficult-path branch prediction using subordinate microthreads
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Speculative Data-Driven Multithreading
HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Hi-index | 0.00 |
We propose a subordinate threding scheme in which the main thread skips instructions that are guarabted to be correctly executed by the dubordinate thread. Speeding up the main thread increases the overall speed of the processor. Also, a faster main thread can detect the subordinatethread's mispredictions earlier, therby cutting down the amount of time the subordinate thread spends on wrong-path instructions. Hence, the subordinate thread is now free to do more aggressive speculations. We develop a cycle-accurate simulator and evaluatedour symbolic subordinate threading shcme for SPEC2000 integer benchmarks. Our results show an average performance improvement of 21% over a base subordinate threading shceme that does not let the main thread skip any instructions.