Implementation of the PIPE Processor
Computer - Special issue on experimental research in computer architecture
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Evaluation of the WM architecture
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Complexity-effective superscalar processors
Proceedings of the 24th annual international symposium on Computer architecture
Sunder: a programmable hardware prefetch architecture for numerical loops
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
A Limitation Study into Access Decoupling
Euro-Par '97 Proceedings of the Third International Euro-Par Conference on Parallel Processing
Performance Characterization of the Pentium® Pro Processor
HPCA '97 Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture
Predictor-directed stream buffers
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Multithreading decoupled architectures for complexity-effective general purpose computing
ACM SIGARCH Computer Architecture News - Special Issue: PACT 2001 workshops
A Decoupled Predictor-Directed Stream Prefetching Architecture
IEEE Transactions on Computers
Code Partitioning in Decoupled Compilers
Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
Design and evaluation of a hierarchical decoupled architecture
The Journal of Supercomputing
Journal of Signal Processing Systems
Design and effectiveness of small-sized decoupled dispatch queues
Euro-Par'06 Proceedings of the 12th international conference on Parallel Processing
Hi-index | 0.00 |
In this paper we investigate the behavior of data prefetching on an access decoupled machine and a superscalar machine. We assess if there are benefits to using the decoupling paradigm given that an out-of-order (o-o-o) superscalar architecture could in principle prefetch to the same degree as an access decoupled machine. We have found that for large issue width the access decoupled machine can hide memory latency more effectively than a single instruction window o-o-o superscalar architecture. Our findings also demonstrate that an access decoupled machine offers the benefit of reducing the complexity of window issue logic.