The Manchester prototype dataflow computer
Communications of the ACM - Special section on computer architecture
Annual review of computer science vol. 1, 1986
Managing resources in a parallel machine
Proc. of the IFIP TC 10 working conference on Fifth generation computer architectures
Programming parallel processors
Programming parallel processors
Future scientific programming on parallel machines
Journal of Parallel and Distributed Computing - Special Issue on Languages, Compilers and environments for Parallel Programming
Two fundamental issues in multiprocessing
4th International DFVLR Seminar on Foundations of Engineering Sciences on Parallel Computing in Science and Engineering
Can dataflow subsume von Neumann computing?
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
The price of asynchronous parallelism: an analysis of dataflow architectures
Proceedings of the conference on CONPAR 88
A preliminary architecture for a basic data-flow processor
25 years of the international symposia on Computer architecture (selected papers)
ALICE a multi-processor reduction machine for the parallel evaluation CF applicative languages
FPCA '81 Proceedings of the 1981 conference on Functional programming languages and computer architecture
AN ABSTRACT IMPLEMENTATION FOR A GENERALIZED DATA FLOW LANGUAGE
AN ABSTRACT IMPLEMENTATION FOR A GENERALIZED DATA FLOW LANGUAGE
A COMPILER FOR THE MIT TAGGED-TOKEN DATAFLOW ARCHITECTURE
A COMPILER FOR THE MIT TAGGED-TOKEN DATAFLOW ARCHITECTURE
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
A timed Petri-net model for fine-grain loop scheduling
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
GT-EP: a novel high-performance real-time architecture
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Multithreading: a revisionist view of dataflow architectures
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Load balancing by function distribution on the EM-4 prototype
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Executing DSP Applications in a Fine-Grained Dataflow Environment
IEEE Transactions on Software Engineering
The expandable split window paradigm for exploiting fine-grain parallelsim
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Thread-based programming for the EM-4 hybrid dataflow machine
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Improved multithreading techniques for hiding communication latency in multiprocessors
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Active messages: a mechanism for integrated communication and computation
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Dynamic dependency analysis of ordinary programs
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
An analysis of loop latency in dataflow execution
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Global analysis for partitioning non-strict programs into sequential threads
LFP '92 Proceedings of the 1992 ACM conference on LISP and functional programming
A foundation for an efficient multi-threaded scheme system
LFP '92 Proceedings of the 1992 ACM conference on LISP and functional programming
On the limits of program parallelism and its smoothability
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Microarchitecture support for dynamic scheduling of acyclic task graphs
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Exploiting instruction-level parallelism: the multithreaded approach
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
LogP: towards a realistic model of parallel computation
PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Experience with fine-grain synchronization in MIMD machines for preconditioned conjugate gradient
PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Generation and quantitative evaluation of dataflow clusters
FPCA '93 Proceedings of the conference on Functional programming languages and computer architecture
Super-threading: architectural and software mechanisms for optimizing parallel computation
ICS '93 Proceedings of the 7th international conference on Supercomputing
T: integrated building blocks for parallel computing
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
A model for dataflow based vector execution
ICS '94 Proceedings of the 8th international conference on Supercomputing
Tempest and typhoon: user-level shared memory
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
POPL '95 Proceedings of the 22nd ACM SIGPLAN-SIGACT symposium on Principles of programming languages
The MIT Alewife machine: architecture and performance
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
The EM-X parallel computer: architecture and basic performance
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Design of cache memories for multi-threaded dataflow architecture
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Analysis of communications and overhead reduction in multithreaded execution
PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
Control of loop parallelism in multithreaded code
PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
Limits on the performance benefits of multithreading and prefetching
Proceedings of the 1996 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Multithreading with Distributed Functional Units
IEEE Transactions on Computers
An evaluation of bottom-up and top-down thread generation techniques
MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
Effects of communication latency, overhead, and bandwidth in a cluster architecture
Proceedings of the 24th annual international symposium on Computer architecture
Retrospective: a preliminary architecture for a basic data flow processor
25 years of the international symposia on Computer architecture (selected papers)
Retrospective: multiscalar processors
25 years of the international symposia on Computer architecture (selected papers)
Active messages: a mechanism for integrating communication and computation
25 years of the international symposia on Computer architecture (selected papers)
Tempest and typhoon: user-level shared memory
25 years of the international symposia on Computer architecture (selected papers)
The MIT Alewife machine: architecture and performance
25 years of the international symposia on Computer architecture (selected papers)
The Sisal project: real world functional programming
Compiler optimizations for scalable parallel systems
Distributed data flow computing system
ACM-SE 30 Proceedings of the 30th annual Southeast regional conference
Asynchrony in parallel computing: from dataflow to multithreading
Progress in computer research
Scheduled Dataflow: Execution Paradigm, Architecture, and Performance Evaluation
IEEE Transactions on Computers - Special issue on the parallel architecture and compilation techniques conference
Fine-Grained Multithreading with Process Calculi
IEEE Transactions on Computers - Special issue on the parallel architecture and compilation techniques conference
Asynchrony in parallel computing: from dataflow to multithreading
Progress in computer research
A Hybrid Scheme for Processing Data Structures in a Dataflow Environment
IEEE Transactions on Parallel and Distributed Systems
Amir Roth: Speculative Multithreaded Processors
HiPC '00 Proceedings of the 7th International Conference on High Performance Computing
Multithreaded Parallel Computer Model with Performance Evaluation
IPDPS '00 Proceedings of the 15 IPDPS 2000 Workshops on Parallel and Distributed Processing
An Evaluation of Optimized Threaded Code Generation
PACT '94 Proceedings of the IFIP WG10.3 Working Conference on Parallel Architectures and Compilation Techniques
Two Fundamental Limits on Dataflow Multiprocessing
PACT '93 Proceedings of the IFIP WG10.3. Working Conference on Architectures and Compilation Techniques for Fine and Medium Grain Parallelism
Fresh Breeze: a multiprocessor chip architecture guided by modular programming principles
ACM SIGARCH Computer Architecture News
A practical processor design for multithreading
FRONTIERS '96 Proceedings of the 6th Symposium on the Frontiers of Massively Parallel Computation
The Named-State Register File: Implementation and Performance
HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Design and performance evaluation of a multithreaded architecture
HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Fine-grain multi-thread processor architecture for massively parallel processing
HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
The Sisal Model of Functional Programming and its Implementation
PAS '97 Proceedings of the 2nd AIZU International Symposium on Parallel Algorithms / Architecture Synthesis
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
A timed Petri-net model for fine-grain loop scheduling
CASCON '91 Proceedings of the 1991 conference of the Centre for Advanced Studies on Collaborative research
Algorithm + strategy = parallelism
Journal of Functional Programming
Proceedings of the 1st conference on Computing frontiers
Scalable selective re-execution for EDGE architectures
ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Analysis and Modeling of Advanced PIM Architecture Design Tradeoffs
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
A low cost, multithreaded processing-in-memory system
WMPI '04 Proceedings of the 3rd workshop on Memory performance issues: in conjunction with the 31st international symposium on computer architecture
Implementing declarative overlays
Proceedings of the twentieth ACM symposium on Operating systems principles
Optimal and efficient parallel tridiagonal solvers using direct methods
The Journal of Supercomputing - Special issue: Parallel and distributed processing and applications
Proceedings of the 33rd annual international symposium on Computer Architecture
Area-Performance Trade-offs in Tiled Dataflow Architectures
Proceedings of the 33rd annual international symposium on Computer Architecture
Modeling instruction placement on a spatial architecture
Proceedings of the eighteenth annual ACM symposium on Parallelism in algorithms and architectures
Reducing control overhead in dataflow architectures
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Data-Driven Multithreading Using Conventional Microprocessors
IEEE Transactions on Parallel and Distributed Systems
A case for chip multiprocessors based on the data-driven multithreading model
International Journal of Parallel Programming
Instruction scheduling for a tiled dataflow architecture
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
ACM Transactions on Computer Systems (TOCS)
Executing irregular scientific applications on stream architectures
Proceedings of the 21st annual international conference on Supercomputing
Multithreaded architecture for multimedia processing
Integrated Computer-Aided Engineering
An expressive language and efficient execution system for software agents
Journal of Artificial Intelligence Research
Communications of the ACM
Deadlock avoidance for streaming computations with filtering
Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
Chip multiprocessor based on data-driven multithreading model
International Journal of High Performance Systems Architecture
Task superscalar: using processors as functional units
HotPar'10 Proceedings of the 2nd USENIX conference on Hot topics in parallelism
Preliminary design examination of the ParalleX system from a software and hardware perspective
ACM SIGMETRICS Performance Evaluation Review - Special issue on the 1st international workshop on performance modeling, benchmarking and simulation of high performance computing systems (PMBS 10)
Operating systems must support GPU abstractions
HotOS'13 Proceedings of the 13th USENIX conference on Hot topics in operating systems
Proceedings of the international conference on Supercomputing
PTask: operating system abstractions to manage GPUs as compute devices
SOSP '11 Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles
Software data-triggered threads
Proceedings of the ACM international conference on Object oriented programming systems languages and applications
Application of the ParalleX execution model to stencil-based problems
Computer Science - Research and Development
Hi-index | 0.02 |
Dataflow architectures tolerate long unpredictable communication delays and support generation and coordination of parallel activities directly in hardware, rather than assuming that program mapping will cause these issues to disappear. However, the proposed mechanisms are complex and introduce new mapping complications. This paper presents a greatly simplified approach to dataflow execution, called the explicit token store (ETS) architecture, and its current realization in Monsoon. The essence of dynamic dataflow execution is captured by a simple transition on state bits associated with storage local to a processor. Low-level storage management is performed by the compiler in assigning nodes to slots in an activation frame, rather than dynamically in hardware. The processor is simple, highly pipelined, and quite general. It may be viewed as a generalization of a fairly primitive von Neumann architecture. Although the addressing capability is restrictive, there is exactly one instruction executed for each action on the dataflow graph. Thus, the machine oriented ETS model provides new understanding of the merits and the real cost of direct execution of dataflow graphs.