Toward a dataflow/von Neumann hybrid architecture

Authors:
R. A. Iannucci
Affiliations:
Massachusetts Institute of Technology
Venue:
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Year:
1988

Citing 17
Cited 59

MULTILISP: a language for concurrent symbolic computation

ACM Transactions on Programming Languages and Systems (TOPLAS)
Incorporating data flow ideas into von neumann processors for parallel execution

IEEE Transactions on Computers
Resource requirements of dataflow programs

ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
MASA: a multithreaded processor architecture for parallel symbolic computing

ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Two fundamental issues in multiprocessing

4th International DFVLR Seminar on Foundations of Engineering Sciences on Parallel Computing in Science and Engineering
Reduced instruction set computers

Communications of the ACM - Special section on computer architecture
Partitioning parallel programs for macro-dataflow

LFP '86 Proceedings of the 1986 ACM conference on LISP and functional programming
Interprocess communication and processor dispatching on the Intel 432

ACM Transactions on Computer Systems (TOCS)
The CRAY-1 computer system

Communications of the ACM - Special issue on computer architecture
Very Long Instruction Word architectures and the ELI-512

ISCA '83 Proceedings of the 10th annual international symposium on Computer architecture
Performance measurements on HEP - a pipelined MIMD computer

ISCA '83 Proceedings of the 10th annual international symposium on Computer architecture
A critique of multiprocessing von Neumann style

ISCA '83 Proceedings of the 10th annual international symposium on Computer architecture
A critique of multiprocessing von Neumann style

ISCA '83 Proceedings of the 10th annual international symposium on Computer architecture
The 801 minicomputer

ASPLOS I Proceedings of the first international symposium on Architectural support for programming languages and operating systems
Architectural support for the efficient generation of code for horizontal architectures

ASPLOS I Proceedings of the first international symposium on Architectural support for programming languages and operating systems
RESOURCE MANAGEMENT FOR THE TAGGED TOKEN DATAFLOW ARCHITECTURE

RESOURCE MANAGEMENT FOR THE TAGGED TOKEN DATAFLOW ARCHITECTURE
A COMPILER FOR THE MIT TAGGED-TOKEN DATAFLOW ARCHITECTURE

A COMPILER FOR THE MIT TAGGED-TOKEN DATAFLOW ARCHITECTURE

Can dataflow subsume von Neumann computing?

ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Exploring the benefits of multiple hardware contexts in a multiprocessor architecture: preliminary results

ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Analysis of multithreaded architectures for parallel computing

SPAA '90 Proceedings of the second annual ACM symposium on Parallel algorithms and architectures
TWIST-TOP: transputers with I-stores test out processor

CSC '90 Proceedings of the 1990 ACM annual conference on Cooperation
Fine-grain parallelism with minimal hardware support: a compiler-controlled threaded abstract machine

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Comparative evaluation of latency reducing and tolerating techniques

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Multithreading: a revisionist view of dataflow architectures

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Distributed Instruction Set Computer Architecture

IEEE Transactions on Computers
Executing DSP Applications in a Fine-Grained Dataflow Environment

IEEE Transactions on Software Engineering
SPIRE: streaming processing with instructions release element

ACM SIGARCH Computer Architecture News
Hiding memory latency using dynamic scheduling in shared-memory multiprocessors

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
The expandable split window paradigm for exploiting fine-grain parallelsim

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
An elementary processor architecture with simultaneous instruction issuing from multiple threads

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Thread-based programming for the EM-4 hybrid dataflow machine

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
T: a multithreaded massively parallel architecture

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Processor coupling: integrating compile time and runtime scheduling for parallelism

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Cache Memories for Data Flow Machines

IEEE Transactions on Computers
Microarchitecture support for dynamic scheduling of acyclic task graphs

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
An efficient implementation scheme of concurrent object-oriented languages on stock multicomputers

PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Register relocation: flexible contexts for multithreading

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Generation and quantitative evaluation of dataflow clusters

FPCA '93 Proceedings of the conference on Functional programming languages and computer architecture
Data stream control optimization in dataflow architectures

ICS '93 Proceedings of the 7th international conference on Supercomputing
Empirical study of latency hiding on a fine-grain parallel processor

ICS '93 Proceedings of the 7th international conference on Supercomputing
Space-efficient scheduling of multithreaded computations

STOC '93 Proceedings of the twenty-fifth annual ACM symposium on Theory of computing
A model for dataflow based vector execution

ICS '94 Proceedings of the 8th international conference on Supercomputing
Implementation trade-offs in using a restricted data flow architecture in a high performance RISC microprocessor

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Design of cache memories for multi-threaded dataflow architecture

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Single-program speculative multithreading (SPSM) architecture: compiler-assisted fine-grained multithreading

PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
Analysis of communications and overhead reduction in multithreaded execution

PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
A basic architecture supporting LGDG computation

ICS '90 Proceedings of the 4th international conference on Supercomputing
An evaluation of bottom-up and top-down thread generation techniques

MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
Thread partitioning and scheduling based on cost model

Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
Retrospective: a preliminary architecture for a basic data flow processor

25 years of the international symposia on Computer architecture (selected papers)
Retrospective: multiscalar processors

25 years of the international symposia on Computer architecture (selected papers)
APRIL: a processor architecture for multiprocessing

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Asynchrony in parallel computing: from dataflow to multithreading

Progress in computer research
Scheduled Dataflow: Execution Paradigm, Architecture, and Performance Evaluation

IEEE Transactions on Computers - Special issue on the parallel architecture and compilation techniques conference
Asynchrony in parallel computing: from dataflow to multithreading

Progress in computer research
Dataflow Architectures and Multithreading

Computer
Speculative Multithreaded Processors

Computer
Performance Tradeoffs in Multithreaded Processors

IEEE Transactions on Parallel and Distributed Systems
Amir Roth: Speculative Multithreaded Processors

HiPC '00 Proceedings of the 7th International Conference on High Performance Computing
Asynchronous Resource Management

IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
An Evaluation of Optimized Threaded Code Generation

PACT '94 Proceedings of the IFIP WG10.3 Working Conference on Parallel Architectures and Compilation Techniques
The Initial Performance of a Bottom-Up Clustering Algorithm for Dataflow Graphs

PACT '93 Proceedings of the IFIP WG10.3. Working Conference on Architectures and Compilation Techniques for Fine and Medium Grain Parallelism
Two Fundamental Limits on Dataflow Multiprocessing

PACT '93 Proceedings of the IFIP WG10.3. Working Conference on Architectures and Compilation Techniques for Fine and Medium Grain Parallelism
The Named-State Register File: Implementation and Performance

HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Design and performance evaluation of a multithreaded architecture

HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Advances in dataflow programming languages

ACM Computing Surveys (CSUR)
Analysis and Modeling of Advanced PIM Architecture Design Tradeoffs

Proceedings of the 2004 ACM/IEEE conference on Supercomputing
CODACS Prototype: A Platform-Processor for CHIARA Programs

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 13 - Volume 14
Program Demultiplexing: Data-flow based Speculative Parallelization of Methods in Sequential Programs

Proceedings of the 33rd annual international symposium on Computer Architecture
Distributed Microarchitectural Protocols in the TRIPS Prototype Processor

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Multithreaded architecture for multimedia processing

Integrated Computer-Aided Engineering
Enhancing Microkernel Performance on VLIW DSP Processors via Multiset Context Switch

Journal of Signal Processing Systems
A Multiprocessor SoC Architecture with Efficient Communication Infrastructure and Advanced Compiler Support for Easy Application Development

IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences
Decomposition of Task-Level Concurrency on C Programs Applied to the Design of Multiprocessor SoC

IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences
An expressive language and efficient execution system for software agents

Journal of Artificial Intelligence Research
Task superscalar: using processors as functional units

HotPar'10 Proceedings of the 2nd USENIX conference on Hot topics in parallelism

Quantified Score

Hi-index	0.01

Visualization

Abstract

Dataflow architectures offer the ability to trade program level parallelism in order to overcome machine level latency. Dataflow further offers a uniform synchronization paradigm, representing one end of a spectrum wherein the unit of scheduling is a single instruction. At the opposite extreme are the von Neumann architectures which schedule on a task, or process, basis.This paper examines the spectrum by proposing a new architecture which is a hybrid of dataflow and von Neumann organizations. The analysis attempts to discover those features of the dataflow architecture, lacking in a von Neumann machine, which are essential for tolerating latency and synchronization costs. These features are captured in the concept of a parallel machine language which can be grafted on top of an otherwise traditional von Neumann base. In such an architecture, the units of scheduling, called scheduling quanta, are bound at compile time rather than at instruction set design time. The parallel machine language supports this notion via a large synchronization name space.A prototypical architecture is described, and results of simulation studies are presented. A comparison is made between the MIT Tagged-Token Dataflow machine and the subject machine which presents a model for understanding the cost of synchronization in a parallel environment.