Decoupled access/execute computer architectures

Authors:
James E. Smith
Affiliations:
Department of Electrical and Computer Engineering, University of Wisconsin-Madison, Madison, Wisconsin
Venue:
ISCA '82 Proceedings of the 9th annual symposium on Computer Architecture
Year:
1982

Citing 2
Cited 62

Design of a Computer—The Control Data 6600

Design of a Computer—The Control Data 6600
Planning a computer system: Project Stretch

Planning a computer system: Project Stretch

Processor Scheduling for Linearly Connected Parallel Processors

IEEE Transactions on Computers
A Simulation Study of Decoupled Architecture Computers

IEEE Transactions on Computers
Highly concurrent scalar processing

ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
Multiple instruction issue and single-chip processors

MICRO 21 Proceedings of the 21st annual workshop on Microprogramming and microarchitecture
Implementation of the PIPE Processor

Computer - Special issue on experimental research in computer architecture
High-bandwidth data memory systems for superscalar processors

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
An effective on-chip preloading scheme to reduce data access penalty

Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Effects of building blocks on the performance of super-scalar architecture

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Register traffic analysis for streamlining inter-operation communication in fine-grain parallel processors

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Pseudo vector processor based on register-windowed superscalar pipeline

Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Hardware implementation issues of data prefetching

ICS '95 Proceedings of the 9th international conference on Supercomputing
An effective programmable prefetch engine for on-chip caches

Proceedings of the 28th annual international symposium on Microarchitecture
Decoupling integer execution in superscalar processors

Proceedings of the 28th annual international symposium on Microarchitecture
A comparision of superscalar and decoupled access/execute architectures

MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
Techniques for extracting instruction level parallelism on MIMD architectures

MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
On high-bandwidth data cache design for multi-issue processors

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
The multicluster architecture: reducing cycle time through partitioning

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
An empirical analysis of instruction repetition

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Dependence based prefetching for linked data structures

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Cache-conscious data placement

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
A dynamic scheduling logic for exploiting multiple functional units in single chip multithreaded architectures

Proceedings of the 1999 ACM symposium on Applied computing
Optimal scheduling of arithmetic operations in parallel with memory access (preliminary version)

POPL '85 Proceedings of the 12th ACM SIGACT-SIGPLAN symposium on Principles of programming languages
Optimal code generation for expressions on super scalar machines

ACM '86 Proceedings of 1986 ACM Fall joint computer conference
An investigation of static versus dynamic scheduling

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
The Multicluster Architecture: Reducing Processor Cycle Time Through Partitioning

International Journal of Parallel Programming
Decoupled access/execute computer architectures

ACM Transactions on Computer Systems (TOCS)
Slice-processors: an implementation of operation-based prediction

ICS '01 Proceedings of the 15th international conference on Supercomputing
Evaluating the Use of Register Queues in Software Pipelined Loops

IEEE Transactions on Computers - Special issue on the parallel architecture and compilation techniques conference
Scheduled Dataflow: Execution Paradigm, Architecture, and Performance Evaluation

IEEE Transactions on Computers - Special issue on the parallel architecture and compilation techniques conference
Parallel architecture and compilation techniques: selection of workshop papers, guests' editors introduction

ACM SIGARCH Computer Architecture News - Special Issue: PACT 2001 workshops
Multithreading decoupled architectures for complexity-effective general purpose computing

ACM SIGARCH Computer Architecture News - Special Issue: PACT 2001 workshops
Sunder: a programmable hardware prefetch architecture for numerical loops

Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Guest Editor's Introduction Real Machines: Design Choices/Engineering Trade-Offs

Computer
Effective Hardware-Based Data Prefetching for High-Performance Processors

IEEE Transactions on Computers
A Hardware Scheme for Data Prefetching

HPCN Europe 2000 Proceedings of the 8th International Conference on High-Performance Computing and Networking
Slipstream Execution Mode for CMP-Based Multiprocessors

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
A code-splitting algorithm

ACM SIGARCH Computer Architecture News
Decoupled Software Pipelining with the Synchronization Array

Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Cache Refill/Access Decoupling for Vector Machines

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Tolerating memory latency through push prefetching for pointer-intensive applications

ACM Transactions on Architecture and Code Optimization (TACO)
Dual-Core Execution: Building a Highly Scalable Single-Thread Instruction Window

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
A Criticality Analysis of Clustering in Superscalar Processors

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Automatic Thread Extraction with Decoupled Software Pipelining

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
An Efficient Way of Passing of Data in a Multithreaded Scheduled Dataflow Architecture

HPCASIA '05 Proceedings of the Eighth International Conference on High-Performance Computing in Asia-Pacific Region
Design and evaluation of a hierarchical decoupled architecture

The Journal of Supercomputing
Interactive presentation: A decoupled architecture of processors with scratch-pad memory hierarchy

Proceedings of the conference on Design, automation and test in Europe
The Cray BlackWidow: a highly scalable vector multiprocessor

Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Server-based data push architecture for multi-processor environments

Journal of Computer Science and Technology
Performance scalability of decoupled software pipelining

ACM Transactions on Architecture and Code Optimization (TACO)
Guided Prefetching Based on Runtime Access Patterns

ICCS '08 Proceedings of the 8th international conference on Computational Science, Part III
An interleaved array-processing architecture

AFIPS '84 Proceedings of the July 9-12, 1984, national computer conference and exposition
A complexity-effective microprocessor design with decoupled dispatch queues and prefetching

Parallel Computing
Decoupled Processors Architecture for Accelerating Data Intensive Applications using Scratch-Pad Memory Hierarchy

Journal of Signal Processing Systems
Exploiting execution locality with a decoupled Kilo-instruction processor

ISHPC'05/ALPS'06 Proceedings of the 6th international symposium on high-performance computing and 1st international conference on Advanced low power systems
WiDGET: Wisconsin decoupled grid execution tiles

Proceedings of the 37th annual international symposium on Computer architecture
Inter-core prefetching for multicore processors using migrating helper threads

Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
CoRAM: an in-fabric memory architecture for FPGA-based computing

Proceedings of the 19th ACM/SIGDA international symposium on Field programmable gate arrays
OUTRIDER: efficient memory latency tolerance with decoupled strands

Proceedings of the 38th annual international symposium on Computer architecture
Design and effectiveness of small-sized decoupled dispatch queues

Euro-Par'06 Proceedings of the 12th international conference on Parallel Processing
A template system for the efficient compilation of domain abstractions onto reconfigurable computers

Journal of Systems Architecture: the EUROMICRO Journal
Control-Flow Decoupling

MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Towards more efficient execution: a decoupled access-execute approach

Proceedings of the 27th international ACM conference on International conference on supercomputing

Quantified Score

Hi-index	0.02

Visualization

Abstract

An architecture for improving computer performance is presented and discussed. The main feature of the architecture is a high degree of decoupling between operand access and execution. This results in an implementation which has two separate instruction streams that communicate via queues. A similar architecture has been previously proposed for array processors, but in that context the software is called on to do most of the coordination and synchronization between the instruction streams. This paper emphasizes implementation features that remove this burden from the programmer. Performance comparisons with a conventional scalar architecture are given, and these show that considerable performance gains are possible. Single instruction stream versions, both physical and conceptual, are discussed with the primary goal of minimizing the differences with conventional architectures. This would allow known compilation and programming techniques to be used. Finally, the problem of deadlock in such a system is discussed, and one possible solution is given.