Structure handling in data-flow systems
IEEE Transactions on Computers - The MIT Press scientific computation series
Performance of various computers using standard linear equations software in a FORTRAN environment
ACM SIGARCH Computer Architecture News
Toward a dataflow/von Neumann hybrid architecture
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Code Optimization for Tagged-Token Dataflow Machines
IEEE Transactions on Computers
An architecture of a dataflow single chip processor
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Supercompilers for parallel and vector computers
Supercompilers for parallel and vector computers
Journal of Parallel and Distributed Computing - Special issue: data-flow processing
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Multithreading: a revisionist view of dataflow architectures
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
A quantitative analysis of locality in dataflow programs
MICRO 24 Proceedings of the 24th annual international symposium on Microarchitecture
Compiler-controlled multithreading for lenient parallel languages
Proceedings of the 5th ACM conference on Functional programming languages and computer architecture
Retire Fortran?: a debate rekindled
Communications of the ACM
Thread-based programming for the EM-4 hybrid dataflow machine
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
An analysis of loop latency in dataflow execution
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
A quantitative analysis of dataflow program execution—preliminaries to a hybrid design
Journal of Parallel and Distributed Computing - Special issue on dataflow and multithreaded architectures
Generation and quantitative evaluation of dataflow clusters
FPCA '93 Proceedings of the conference on Functional programming languages and computer architecture
The specification of a new Manchester Dataflow machine
ICS '89 Proceedings of the 3rd international conference on Supercomputing
Monsoon: an explicit token-store architecture
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Iterative Instructions in the Manchester Dataflow Computer
IEEE Transactions on Parallel and Distributed Systems
First version of a data flow procedure language
Programming Symposium, Proceedings Colloque sur la Programmation
Hi-index | 0.01 |
Although the dataflow model has been shown to allow the exploitation of parallelism at all levels, research of the past decade has revealed several fundamental problems: Synchronization at the instruction level, token matching, coloring and re-labeling operations have a negative impact on performance by significantly increasing the number of non-compute “overhead” cycles. Recently, many novel Hybrid von-Neumann Data Driven machines have been proposed to alleviate some of these problems. The major objective has been to reduce or eliminate unnecesssary synchronization costs through simplified operand matching schemes and increased task granularity. Moreover, the results from recent studies quantifying locality suggest sufficient spatial and temporal locality is present in dataflow execution to merit its exploitation.In this paper we present a data structure for exploiting locality in a data driven environment: the Vector Cell. A Vector Cell consists of a number of fixed length chunks of data elements. Each chunk is tagged with a presence bit, providing intra-chunk strictness and inter-chunk non-strictness to data structure access. We describe the semantics of the model, processor architecture and instruction set as well as a Sisal to dataflow vectorizing compiler back-end. The model is evaluated by comparing its performance to those of both a classical fine-grain dataflow processor employing I-structures and a conventional pipelined vector processor. Results indicate the model is surprisingly resilient to long memory and communication latencies, and is able to dynamically exploit the underlying parallelism across multiple processing elements at run time.