Memory Latency Effects in Decoupled Architectures

Authors:
L. Kurian;P. T. Hulina;L. D. Coraor
Affiliations:
-;-;-
Venue:
IEEE Transactions on Computers
Year:
1994

Citing 16
Cited 9

A Simulation Study of Decoupled Architecture Computers

IEEE Transactions on Computers
Line (block) size choice for CPU cache memories

IEEE Transactions on Computers
The ZS-1 central processor

ASPLOS II Proceedings of the second international conference on Architectual support for programming languages and operating systems
Performance evaluation of on-chip register and cache organizations

ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
MIPS RISC architecture

MIPS RISC architecture
Dynamic Instruction Scheduling and the Astronautics ZS-1

Computer
Improving performance of small on-chip instruction caches

ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Implementation of the PIPE Processor

Computer - Special issue on experimental research in computer architecture
A Performance Comparison of the IBM RS/6000 and the Astronautics ZS-1

Computer - Special issue on experimental research in computer architecture
Classification and performance evaluation of instruction buffering techniques

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Memory latency effects in decoupled architectures with a single data memory module

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Evaluation of the WM architecture

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
PIPE: a VLSI decoupled architecture

ISCA '85 Proceedings of the 12th annual international symposium on Computer architecture
Cache Memories

ACM Computing Surveys (CSUR)
Decoupled access/execute computer architectures

ACM Transactions on Computer Systems (TOCS)
Performance Trade-Offs for Microprocessor Cache Memories

IEEE Micro

Improving Latency Tolerance of Multithreading through Decoupling

IEEE Transactions on Computers
A Simulation Study of Decoupled Vector Architectures

The Journal of Supercomputing
Program balance and its impact on high performance RISC architectures

HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Decoupled vector architectures

HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
Design and evaluation of a hierarchical decoupled architecture

The Journal of Supercomputing
Interactive presentation: A decoupled architecture of processors with scratch-pad memory hierarchy

Proceedings of the conference on Design, automation and test in Europe
A complexity-effective microprocessor design with decoupled dispatch queues and prefetching

Parallel Computing
Decoupled Processors Architecture for Accelerating Data Intensive Applications using Scratch-Pad Memory Hierarchy

Journal of Signal Processing Systems
Design and effectiveness of small-sized decoupled dispatch queues

Euro-Par'06 Proceedings of the 12th international conference on Parallel Processing

Quantified Score

Hi-index	14.98

Visualization

Abstract

Decoupled computer architectures partition the memory access and execute functions in a computer program and achieve high-performance by exploiting the fine-grain parallelism between the two. These architectures make use of an access processor to perform the data fetch ahead of demand by the execute process and hence are often less sensitive to memory access delays than conventional architectures. Past performance studies of decoupled computers used memory systems that are interleaved or pipelined, and in those studies, latency effects were partially hidden due to interleaving. A detailed simulation study of the latency effects in decoupled computers is undertaken in this paper. Decoupled architecture performance is compared to single processors with caches. The memory latency sensitivity of cache based uniprocessors and decoupled systems is studied. Simulations are performed to determine the significance of data caches in a decoupled architecture. It is observed that decoupled architectures can reduce the peak memory bandwidth requirement, but not the total bandwidth, whereas data caches can reduce the total bandwidth by capturing locality. It may be concluded that despite their capability to partially mask the effects of memory latency, decoupled architectures still need a data cache.