A Simulation Study of Decoupled Architecture Computers
IEEE Transactions on Computers
Line (block) size choice for CPU cache memories
IEEE Transactions on Computers
ASPLOS II Proceedings of the second international conference on Architectual support for programming languages and operating systems
Performance evaluation of on-chip register and cache organizations
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
MIPS RISC architecture
Improving performance of small on-chip instruction caches
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Implementation of the PIPE Processor
Computer - Special issue on experimental research in computer architecture
A Performance Comparison of the IBM RS/6000 and the Astronautics ZS-1
Computer - Special issue on experimental research in computer architecture
Classification and performance evaluation of instruction buffering techniques
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Memory latency effects in decoupled architectures with a single data memory module
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Evaluation of the WM architecture
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
PIPE: a VLSI decoupled architecture
ISCA '85 Proceedings of the 12th annual international symposium on Computer architecture
ACM Computing Surveys (CSUR)
Decoupled access/execute computer architectures
ACM Transactions on Computer Systems (TOCS)
Improving Latency Tolerance of Multithreading through Decoupling
IEEE Transactions on Computers
A Simulation Study of Decoupled Vector Architectures
The Journal of Supercomputing
Program balance and its impact on high performance RISC architectures
HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Decoupled vector architectures
HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
Design and evaluation of a hierarchical decoupled architecture
The Journal of Supercomputing
Interactive presentation: A decoupled architecture of processors with scratch-pad memory hierarchy
Proceedings of the conference on Design, automation and test in Europe
Journal of Signal Processing Systems
Design and effectiveness of small-sized decoupled dispatch queues
Euro-Par'06 Proceedings of the 12th international conference on Parallel Processing
Hi-index | 14.98 |
Decoupled computer architectures partition the memory access and execute functions in a computer program and achieve high-performance by exploiting the fine-grain parallelism between the two. These architectures make use of an access processor to perform the data fetch ahead of demand by the execute process and hence are often less sensitive to memory access delays than conventional architectures. Past performance studies of decoupled computers used memory systems that are interleaved or pipelined, and in those studies, latency effects were partially hidden due to interleaving. A detailed simulation study of the latency effects in decoupled computers is undertaken in this paper. Decoupled architecture performance is compared to single processors with caches. The memory latency sensitivity of cache based uniprocessors and decoupled systems is studied. Simulations are performed to determine the significance of data caches in a decoupled architecture. It is observed that decoupled architectures can reduce the peak memory bandwidth requirement, but not the total bandwidth, whereas data caches can reduce the total bandwidth by capturing locality. It may be concluded that despite their capability to partially mask the effects of memory latency, decoupled architectures still need a data cache.