A unified resource management and execution control mechanism for data flow machines
ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
Executing a Program on the MIT Tagged-Token Dataflow Architecture
IEEE Transactions on Computers
Limits of instruction-level parallelism
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Super-threading: architectural and software mechanisms for optimizing parallel computation
ICS '93 Proceedings of the 7th international conference on Supercomputing
Converting thread-level parallelism to instruction-level parallelism via simultaneous multithreading
ACM Transactions on Computer Systems (TOCS)
Software-Directed Register Deallocation for Simultaneous Multithreaded Processors
IEEE Transactions on Parallel and Distributed Systems
Quantitative evaluation of pipelining and decoupling a dynamic instruction scheduling mechanism
Journal of Systems Architecture: the EUROMICRO Journal
Optimizations Enabled by a Decoupled Front-End Architecture
IEEE Transactions on Computers
Scheduled Dataflow: Execution Paradigm, Architecture, and Performance Evaluation
IEEE Transactions on Computers - Special issue on the parallel architecture and compilation techniques conference
A preliminary architecture for a basic data-flow processor
ISCA '75 Proceedings of the 2nd annual symposium on Computer architecture
A Multithreaded Processor Designed for Distributed Shared Memory Systems
APDC '97 Proceedings of the 1997 Advances in Parallel and Distributed Computing Conference (APDC '97)
On the working set concept for data-flow machines
ISCA '83 Proceedings of the 10th annual international symposium on Computer architecture
Decoupled access/execute computer architectures
ISCA '82 Proceedings of the 9th annual symposium on Computer Architecture
Scheduling Byte Code-Defined Data Dependence Graphs of Object Oriented Programs
PARELEC '04 Proceedings of the international conference on Parallel Computing in Electrical Engineering
Computer
Hi-index | 0.00 |
The Scheduled Dataflow (SDF) architecture deviates from current trend of building complex hardware to exploit Instruction Level Parallelism (ILP) by exploring a simpler, yet powerful execution paradigm that is based on dataflow, multithreading and decoupling of memory accesses from execution. A program is partitioned into non-blocking threads and all memory accesses are decoupled from the thread's execution. Data is pre-loaded into the thread's context (registers), and all results are post-stored after the completion of the thread's execution. This paper presents an efficient way of storing of data into the thread's register context directly as opposed to storing of data into the frame memory. This way eliminates the need for creating thread frames when there are sufficient register contexts available in the system. Thus, it is possible to explore the scalability of SDF architecture's performance when more register contexts are available on the chip. All the benchmarks ran using these two methods show performance improvement of at least about 20%. This method of allocating data to a consecutive thread in a multithreaded architecture could be applied generally.