ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
High-bandwidth data memory systems for superscalar processors
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Alternative implementations of two-level adaptive branch prediction
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Software support for speculative loads
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Reducing memory latency via non-blocking and prefetching caches
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Design and evaluation of a compiler algorithm for prefetching
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
The multiflow trace scheduling compiler
The Journal of Supercomputing - Special issue on instruction-level parallelism
Evaluating stream buffers as a secondary cache replacement
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Complexity/performance tradeoffs with non-blocking loads
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
An Approach to Combine Predicated/Speculative Execution for Programs with Unpredictable Branches
PACT '94 Proceedings of the IFIP WG10.3 Working Conference on Parallel Architectures and Compilation Techniques
Tango: a hardware-based data prefetching technique for superscalar processors
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Memory-system design considerations for dynamically-scheduled processors
Proceedings of the 24th annual international symposium on Computer architecture
The interaction of software prefetching with ILP processors in shared-memory systems
Proceedings of the 24th annual international symposium on Computer architecture
Prediction caches for superscalar processors
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Characterization and improvement of load/store cache-based prefetching
ICS '98 Proceedings of the 12th international conference on Supercomputing
25 years of the international symposia on Computer architecture (selected papers)
Fetch directed instruction prefetching
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Smart Memories: a modular reconfigurable architecture
Proceedings of the 27th annual international symposium on Computer architecture
Optimizations Enabled by a Decoupled Front-End Architecture
IEEE Transactions on Computers
A Decoupled Predictor-Directed Stream Prefetching Architecture
IEEE Transactions on Computers
Stride-directed Prefetching for Secondary Caches
ICPP '97 Proceedings of the international Conference on Parallel Processing
Distributed Prefetch-buffer/Cache Design for High Performance Memory Systems
HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
Just Say No: Benefits of Early Cache Miss Determination
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Effective stream-based and execution-based data prefetching
Proceedings of the 18th annual international conference on Supercomputing
Effective Instruction Prefetching via Fetch Prestaging
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
WBTK: a New Set of Microbenchmarks to Explore Memory System Performance for Scientific Computing
International Journal of High Performance Computing Applications
On the importance of optimizing the configuration of stream prefetchers
Proceedings of the 2005 workshop on Memory system performance
Hybrid analytical modeling of pending cache hits, data prefetching, and MSHRs
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Hybrid analytical modeling of pending cache hits, data prefetching, and MSHRs
ACM Transactions on Architecture and Code Optimization (TACO)
Hi-index | 0.01 |
We investigate the relative performance impact of non-blocking loads, stream buffers, and speculative execution both used individually and in conjunction with each other. We have simulated the SPEC92 benchmarks on a statically scheduled quad-issue processor model, running code from the Multiflow compiler. Non-blocking loads and stream buffers both provide a significant performance advantage, and their combination performs significantly better than either alone. For example, with a 64-byte, 2-way set associative cache with 32 cycle fetch latency, non-blocking loads reduce the run-time by 21% while stream-buffers re-duce it by 26%, and the combined use of the two yields a 47% reduction. The addition of speculative execution further improves the performance of the systems that we have simulated, with or without non-blocking loads and stream buffers, by an additional 20% to 40%. We expect that the use of all three of these techniques will be important in future generations of microprocessors.