Monsoon: an explicit token-store architecture

Authors:
Gregory M. Papadopoulos;David E. Culler
Affiliations:
Laboratory for Computer Science, Massachusetts Institute of Technology;Computer Science Division, University of California, Berkeley
Venue:
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Year:
1990

Citing 13
Cited 87

The Manchester prototype dataflow computer

Communications of the ACM - Special section on computer architecture
Dataflow architectures

Annual review of computer science vol. 1, 1986
Managing resources in a parallel machine

Proc. of the IFIP TC 10 working conference on Fifth generation computer architectures
Programming parallel processors

Programming parallel processors
Future scientific programming on parallel machines

Journal of Parallel and Distributed Computing - Special Issue on Languages, Compilers and environments for Parallel Programming
Two fundamental issues in multiprocessing

4th International DFVLR Seminar on Foundations of Engineering Sciences on Parallel Computing in Science and Engineering
Can dataflow subsume von Neumann computing?

ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Exploring the benefits of multiple hardware contexts in a multiprocessor architecture: preliminary results

ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
The price of asynchronous parallelism: an analysis of dataflow architectures

Proceedings of the conference on CONPAR 88
A preliminary architecture for a basic data-flow processor

25 years of the international symposia on Computer architecture (selected papers)
ALICE a multi-processor reduction machine for the parallel evaluation CF applicative languages

FPCA '81 Proceedings of the 1981 conference on Functional programming languages and computer architecture
AN ABSTRACT IMPLEMENTATION FOR A GENERALIZED DATA FLOW LANGUAGE

AN ABSTRACT IMPLEMENTATION FOR A GENERALIZED DATA FLOW LANGUAGE
A COMPILER FOR THE MIT TAGGED-TOKEN DATAFLOW ARCHITECTURE

A COMPILER FOR THE MIT TAGGED-TOKEN DATAFLOW ARCHITECTURE

Fine-grain parallelism with minimal hardware support: a compiler-controlled threaded abstract machine

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
A timed Petri-net model for fine-grain loop scheduling

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
GT-EP: a novel high-performance real-time architecture

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Multithreading: a revisionist view of dataflow architectures

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Load balancing by function distribution on the EM-4 prototype

Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Executing DSP Applications in a Fine-Grained Dataflow Environment

IEEE Transactions on Software Engineering
The expandable split window paradigm for exploiting fine-grain parallelsim

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Thread-based programming for the EM-4 hybrid dataflow machine

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Improved multithreading techniques for hiding communication latency in multiprocessors

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Active messages: a mechanism for integrated communication and computation

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Dynamic dependency analysis of ordinary programs

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
An analysis of loop latency in dataflow execution

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Global analysis for partitioning non-strict programs into sequential threads

LFP '92 Proceedings of the 1992 ACM conference on LISP and functional programming
A foundation for an efficient multi-threaded scheme system

LFP '92 Proceedings of the 1992 ACM conference on LISP and functional programming
On the limits of program parallelism and its smoothability

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Microarchitecture support for dynamic scheduling of acyclic task graphs

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Exploiting instruction-level parallelism: the multithreaded approach

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
LogP: towards a realistic model of parallel computation

PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Experience with fine-grain synchronization in MIMD machines for preconditioned conjugate gradient

PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Generation and quantitative evaluation of dataflow clusters

FPCA '93 Proceedings of the conference on Functional programming languages and computer architecture
Super-threading: architectural and software mechanisms for optimizing parallel computation

ICS '93 Proceedings of the 7th international conference on Supercomputing
T: integrated building blocks for parallel computing

Proceedings of the 1993 ACM/IEEE conference on Supercomputing
A model for dataflow based vector execution

ICS '94 Proceedings of the 8th international conference on Supercomputing
Tempest and typhoon: user-level shared memory

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Separation constraint partitioning: a new algorithm for partitioning non-strict programs into sequential threads

POPL '95 Proceedings of the 22nd ACM SIGPLAN-SIGACT symposium on Principles of programming languages
The MIT Alewife machine: architecture and performance

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
The EM-X parallel computer: architecture and basic performance

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Design of cache memories for multi-threaded dataflow architecture

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Analysis of communications and overhead reduction in multithreaded execution

PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
Control of loop parallelism in multithreaded code

PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
Limits on the performance benefits of multithreading and prefetching

Proceedings of the 1996 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Multithreading with Distributed Functional Units

IEEE Transactions on Computers
An evaluation of bottom-up and top-down thread generation techniques

MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
Effects of communication latency, overhead, and bandwidth in a cluster architecture

Proceedings of the 24th annual international symposium on Computer architecture
Retrospective: a preliminary architecture for a basic data flow processor

25 years of the international symposia on Computer architecture (selected papers)
Retrospective: multiscalar processors

25 years of the international symposia on Computer architecture (selected papers)
Active messages: a mechanism for integrating communication and computation

25 years of the international symposia on Computer architecture (selected papers)
Tempest and typhoon: user-level shared memory

25 years of the international symposia on Computer architecture (selected papers)
The MIT Alewife machine: architecture and performance

25 years of the international symposia on Computer architecture (selected papers)
The Sisal project: real world functional programming

Compiler optimizations for scalable parallel systems
Distributed data flow computing system

ACM-SE 30 Proceedings of the 30th annual Southeast regional conference
Asynchrony in parallel computing: from dataflow to multithreading

Progress in computer research
Scheduled Dataflow: Execution Paradigm, Architecture, and Performance Evaluation

IEEE Transactions on Computers - Special issue on the parallel architecture and compilation techniques conference
Fine-Grained Multithreading with Process Calculi

IEEE Transactions on Computers - Special issue on the parallel architecture and compilation techniques conference
Asynchrony in parallel computing: from dataflow to multithreading

Progress in computer research
Sparcle: An Evolutionary Processor Design for Large-Scale Multiprocessors

IEEE Micro
A Hybrid Scheme for Processing Data Structures in a Dataflow Environment

IEEE Transactions on Parallel and Distributed Systems
Amir Roth: Speculative Multithreaded Processors

HiPC '00 Proceedings of the 7th International Conference on High Performance Computing
Multithreaded Parallel Computer Model with Performance Evaluation

IPDPS '00 Proceedings of the 15 IPDPS 2000 Workshops on Parallel and Distributed Processing
An Evaluation of Optimized Threaded Code Generation

PACT '94 Proceedings of the IFIP WG10.3 Working Conference on Parallel Architectures and Compilation Techniques
Two Fundamental Limits on Dataflow Multiprocessing

PACT '93 Proceedings of the IFIP WG10.3. Working Conference on Architectures and Compilation Techniques for Fine and Medium Grain Parallelism
Fresh Breeze: a multiprocessor chip architecture guided by modular programming principles

ACM SIGARCH Computer Architecture News
A practical processor design for multithreading

FRONTIERS '96 Proceedings of the 6th Symposium on the Frontiers of Massively Parallel Computation
The Named-State Register File: Implementation and Performance

HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Design and performance evaluation of a multithreaded architecture

HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Fine-grain multi-thread processor architecture for massively parallel processing

HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
The Sisal Model of Functional Programming and its Implementation

PAS '97 Proceedings of the 2nd AIZU International Symposium on Parallel Algorithms / Architecture Synthesis
WaveScalar

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
A timed Petri-net model for fine-grain loop scheduling

CASCON '91 Proceedings of the 1991 conference of the Centre for Advanced Studies on Collaborative research
Algorithm + strategy = parallelism

Journal of Functional Programming
BLOB computing

Proceedings of the 1st conference on Computing frontiers
Scalable selective re-execution for EDGE architectures

ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Analysis and Modeling of Advanced PIM Architecture Design Tradeoffs

Proceedings of the 2004 ACM/IEEE conference on Supercomputing
A low cost, multithreaded processing-in-memory system

WMPI '04 Proceedings of the 3rd workshop on Memory performance issues: in conjunction with the 31st international symposium on computer architecture
Implementing declarative overlays

Proceedings of the twentieth ACM symposium on Operating systems principles
Optimal and efficient parallel tridiagonal solvers using direct methods

The Journal of Supercomputing - Special issue: Parallel and distributed processing and applications
Program Demultiplexing: Data-flow based Speculative Parallelization of Methods in Sequential Programs

Proceedings of the 33rd annual international symposium on Computer Architecture
Area-Performance Trade-offs in Tiled Dataflow Architectures

Proceedings of the 33rd annual international symposium on Computer Architecture
Modeling instruction placement on a spatial architecture

Proceedings of the eighteenth annual ACM symposium on Parallelism in algorithms and architectures
Reducing control overhead in dataflow architectures

Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Data-Driven Multithreading Using Conventional Microprocessors

IEEE Transactions on Parallel and Distributed Systems
A case for chip multiprocessors based on the data-driven multithreading model

International Journal of Parallel Programming
Instruction scheduling for a tiled dataflow architecture

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
The WaveScalar architecture

ACM Transactions on Computer Systems (TOCS)
Executing irregular scientific applications on stream architectures

Proceedings of the 21st annual international conference on Supercomputing
Multithreaded architecture for multimedia processing

Integrated Computer-Aided Engineering
An expressive language and efficient execution system for software agents

Journal of Artificial Intelligence Research
The resurgence of parallelism

Communications of the ACM
Deadlock avoidance for streaming computations with filtering

Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
Chip multiprocessor based on data-driven multithreading model

International Journal of High Performance Systems Architecture
Task superscalar: using processors as functional units

HotPar'10 Proceedings of the 2nd USENIX conference on Hot topics in parallelism
Preliminary design examination of the ParalleX system from a software and hardware perspective

ACM SIGMETRICS Performance Evaluation Review - Special issue on the 1st international workshop on performance modeling, benchmarking and simulation of high performance computing systems (PMBS 10)
Operating systems must support GPU abstractions

HotOS'13 Proceedings of the 13th USENIX conference on Hot topics in operating systems
The elephant and the mice: the role of non-strict fine-grain synchronization for modern many-core architectures

Proceedings of the international conference on Supercomputing
PTask: operating system abstractions to manage GPUs as compute devices

SOSP '11 Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles
Software data-triggered threads

Proceedings of the ACM international conference on Object oriented programming systems languages and applications
Application of the ParalleX execution model to stencil-based problems

Computer Science - Research and Development

Quantified Score

Hi-index	0.02

Visualization

Abstract

Dataflow architectures tolerate long unpredictable communication delays and support generation and coordination of parallel activities directly in hardware, rather than assuming that program mapping will cause these issues to disappear. However, the proposed mechanisms are complex and introduce new mapping complications. This paper presents a greatly simplified approach to dataflow execution, called the explicit token store (ETS) architecture, and its current realization in Monsoon. The essence of dynamic dataflow execution is captured by a simple transition on state bits associated with storage local to a processor. Low-level storage management is performed by the compiler in assigning nodes to slots in an activation frame, rather than dynamically in hardware. The processor is simple, highly pipelined, and quite general. It may be viewed as a generalization of a fairly primitive von Neumann architecture. Although the addressing capability is restrictive, there is exactly one instruction executed for each action on the dataflow graph. Thus, the machine oriented ETS model provides new understanding of the merits and the real cost of direct execution of dataflow graphs.