A Simulation Study of Decoupled Architecture Computers
IEEE Transactions on Computers
Line (block) size choice for CPU cache memories
IEEE Transactions on Computers
ASPLOS II Proceedings of the second international conference on Architectual support for programming languages and operating systems
Performance evaluation of on-chip register and cache organizations
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
MIPS RISC architecture
Improving performance of small on-chip instruction caches
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Implementation of the PIPE Processor
Computer - Special issue on experimental research in computer architecture
A Performance Comparison of the IBM RS/6000 and the Astronautics ZS-1
Computer - Special issue on experimental research in computer architecture
Classification and performance evaluation of instruction buffering techniques
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
PIPE: a VLSI decoupled architecture
ISCA '85 Proceedings of the 12th annual international symposium on Computer architecture
ACM Computing Surveys (CSUR)
Decoupled access/execute computer architectures
ACM Transactions on Computer Systems (TOCS)
Load latency tolerance in dynamically scheduled processors
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Memory Latency Effects in Decoupled Architectures
IEEE Transactions on Computers
Program balance and its impact on high performance RISC architectures
HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
SST: Symbolic Subordinate Threading
ICCD '05 Proceedings of the 2005 International Conference on Computer Design
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
OUTRIDER: efficient memory latency tolerance with decoupled strands
Proceedings of the 38th annual international symposium on Computer architecture
Hi-index | 0.01 |
Decoupled computer architectures partition the memory access and execute functions in a computer program and achieve high performance by exploiting the fine-grain parallelism between the two. These architectures make use of an access processor to perform the data fetch ahead of demand by the execute process and hence are often less sensitive to memory access delays than conventional architectures. Past performance studies of decoupled computers used memory systems that are interleave or pipelined. We undertake a simulation study of the latency effects in decoupled computers when connected to a single, conventional non-interleaved data memory module so that the effect of decoupling is isolated from the improvement caused by interleaving. We compare decoupled computer performance to single processors with caches, study the memory latency sensitivity of the decoupled systems, and also perform simulations to determine the significance of data caches in a decoupled computer architecture. The Lawrence Livermore Loops and two signal processing algorithms are used as the simulation benchmark.