A model for dataflow based vector execution

Authors:
W. Marcus Miller;Walid A. Najjar;A. P. Wim Böhm
Affiliations:
IBM Corporation, Networking Systems Division, Research Triangle Park, NC;Department of Computer Science, Colorado State University, Fort Collins, CO;Department of Computer Science, Colorado State University, Fort Collins, CO
Venue:
ICS '94 Proceedings of the 8th international conference on Supercomputing
Year:
1994

Citing 21
Cited 0

Structure handling in data-flow systems

IEEE Transactions on Computers - The MIT Press scientific computation series
Performance of various computers using standard linear equations software in a FORTRAN environment

ACM SIGARCH Computer Architecture News
Toward a dataflow/von Neumann hybrid architecture

ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Code Optimization for Tagged-Token Dataflow Machines

IEEE Transactions on Computers
An architecture of a dataflow single chip processor

ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Supercompilers for parallel and vector computers

Supercompilers for parallel and vector computers
The explicit token store

Journal of Parallel and Distributed Computing - Special issue: data-flow processing
Fine-grain parallelism with minimal hardware support: a compiler-controlled threaded abstract machine

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Multithreading: a revisionist view of dataflow architectures

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Multi-threaded vectorization

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
A quantitative analysis of locality in dataflow programs

MICRO 24 Proceedings of the 24th annual international symposium on Microarchitecture
Compiler-controlled multithreading for lenient parallel languages

Proceedings of the 5th ACM conference on Functional programming languages and computer architecture
Retire Fortran?: a debate rekindled

Communications of the ACM
Thread-based programming for the EM-4 hybrid dataflow machine

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
An analysis of loop latency in dataflow execution

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
A quantitative analysis of dataflow program execution—preliminaries to a hybrid design

Journal of Parallel and Distributed Computing - Special issue on dataflow and multithreaded architectures
Generation and quantitative evaluation of dataflow clusters

FPCA '93 Proceedings of the conference on Functional programming languages and computer architecture
The specification of a new Manchester Dataflow machine

ICS '89 Proceedings of the 3rd international conference on Supercomputing
Monsoon: an explicit token-store architecture

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Iterative Instructions in the Manchester Dataflow Computer

IEEE Transactions on Parallel and Distributed Systems
First version of a data flow procedure language

Programming Symposium, Proceedings Colloque sur la Programmation

Quantified Score

Hi-index	0.01

Visualization

Abstract

Although the dataflow model has been shown to allow the exploitation of parallelism at all levels, research of the past decade has revealed several fundamental problems: Synchronization at the instruction level, token matching, coloring and re-labeling operations have a negative impact on performance by significantly increasing the number of non-compute “overhead” cycles. Recently, many novel Hybrid von-Neumann Data Driven machines have been proposed to alleviate some of these problems. The major objective has been to reduce or eliminate unnecesssary synchronization costs through simplified operand matching schemes and increased task granularity. Moreover, the results from recent studies quantifying locality suggest sufficient spatial and temporal locality is present in dataflow execution to merit its exploitation.In this paper we present a data structure for exploiting locality in a data driven environment: the Vector Cell. A Vector Cell consists of a number of fixed length chunks of data elements. Each chunk is tagged with a presence bit, providing intra-chunk strictness and inter-chunk non-strictness to data structure access. We describe the semantics of the model, processor architecture and instruction set as well as a Sisal to dataflow vectorizing compiler back-end. The model is evaluated by comparing its performance to those of both a classical fine-grain dataflow processor employing I-structures and a conventional pipelined vector processor. Results indicate the model is surprisingly resilient to long memory and communication latencies, and is able to dynamically exploit the underlying parallelism across multiple processing elements at run time.