A unified resource management and execution control mechanism for data flow machines
ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
Toward a dataflow/von Neumann hybrid architecture
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
I-structures: data structures for parallel computing
ACM Transactions on Programming Languages and Systems (TOPLAS)
Executing a Program on the MIT Tagged-Token Dataflow Architecture
IEEE Transactions on Computers
Journal of Parallel and Distributed Computing - Special issue: data-flow processing
Limits of instruction-level parallelism
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Efficiently computing static single assignment form and the control dependence graph
ACM Transactions on Programming Languages and Systems (TOPLAS)
Single instruction stream parallelism is greater than two
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Multithreading: a revisionist view of dataflow architectures
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Limits of control flow on parallelism
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Super-threading: architectural and software mechanisms for optimizing parallel computation
ICS '93 Proceedings of the 7th international conference on Supercomputing
The effectiveness of multiple hardware contexts
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Cilk: an efficient multithreaded runtime system
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
The MIT Alewife machine: architecture and performance
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Simultaneous multithreading: maximizing on-chip parallelism
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
A design study of the EARTH multiprocessor
PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
Control of loop parallelism in multithreaded code
PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
ICS '90 Proceedings of the 4th international conference on Supercomputing
Converting thread-level parallelism to instruction-level parallelism via simultaneous multithreading
ACM Transactions on Computer Systems (TOCS)
Computer architecture (2nd ed.): a quantitative approach
Computer architecture (2nd ed.): a quantitative approach
A Chip-Multiprocessor Architecture with Speculative Multithreading
IEEE Transactions on Computers
The Superthreaded Processor Architecture
IEEE Transactions on Computers
Monsoon: an explicit token-store architecture
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
A Multithreaded Processor Designed for Distributed Shared Memory Systems
APDC '97 Proceedings of the 1997 Advances in Parallel and Distributed Computing Conference (APDC '97)
On the working set concept for data-flow machines
ISCA '83 Proceedings of the 10th annual international symposium on Computer architecture
Decoupled access/execute computer architectures
ISCA '82 Proceedings of the 9th annual symposium on Computer Architecture
Design and performance evaluation of a multithreaded architecture
HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Superscalar Execution with Direct Data Forwarding
PACT '98 Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques
What can we gain by unfolding loops?
ACM SIGPLAN Notices
An Efficient Way of Passing of Data in a Multithreaded Scheduled Dataflow Architecture
HPCASIA '05 Proceedings of the Eighth International Conference on High-Performance Computing in Asia-Pacific Region
Data-Driven Multithreading Using Conventional Microprocessors
IEEE Transactions on Parallel and Distributed Systems
A case for chip multiprocessors based on the data-driven multithreading model
International Journal of Parallel Programming
Performance Enhancement by Eliminating Redundant Function Execution
ANSS '06 Proceedings of the 39th annual Symposium on Simulation
A non-preemptive scheduling algorithm for soft real-time systems
Computers and Electrical Engineering
A hybrid closed queuing network approach to model dataflow in networked distributed processors
Computer Communications
A closed queuing network model with multiple servers for multi-threaded architecture
Computer Communications
A hybrid open queuing network model approach for multi-threaded dataflow architecture
Computer Communications
Exploiting an abstract-machine-based framework in the design of a Java ILP processor
Journal of Systems Architecture: the EUROMICRO Journal
Implementing Fine/Medium Grained TLP Support in a Many-Core Architecture
SAMOS '09 Proceedings of the 9th International Workshop on Embedded Computer Systems: Architectures, Modeling, and Simulation
A hybrid closed queuing network model for multi-threaded dataflow architecture
Computers and Electrical Engineering
Chip multiprocessor based on data-driven multithreading model
International Journal of High Performance Systems Architecture
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Trebuchet: exploring TLP with dataflow virtualisation
International Journal of High Performance Systems Architecture
Simulating the future kilo-x86-64 core processors and their infrastructure
Proceedings of the 45th Annual Simulation Symposium
Hi-index | 0.00 |
In this paper, the Scheduled Dataflow (SDF) architecture驴a decoupled memory/execution, multithreaded architecture using nonblocking threads驴is presented in detail and evaluated against Superscalar architecture. Recent focus in the field of new processor architectures is mainly on VLIW (e.g., IA-64), superscalar, and superspeculative designs. This trend allows for better performance, but at the expense of increased hardware complexity and, possibly, higher power expenditures resulting from dynamic instruction scheduling. Our research deviates from this trend by exploring a simpler, yet powerful execution paradigm that is based on dataflow and multithreading. A program is partitioned into nonblocking execution threads. In addition, all memory accesses are decoupled from the thread's execution. Data is preloaded into the thread's context (registers) and all results are poststored after the completion of the thread's execution. While multithreading and decoupling are possible with control-flow architectures, SDF makes it easier to coordinate the memory accesses and execution of a thread, as well as eliminate unnecessary dependencies among instructions. We have compared the execution cycles required for programs on SDF with the execution cycles required by programs on SimpleScalar (a superscalar simulator) by considering the essential aspects of these architectures in order to have a fair comparison. The results show that SDF architecture can outperform the superscalar. SDF performance scales better with the number of functional units and allows for a good exploitation of Thread Level Parallelism (TLP) and available chip area.