Fast co-simulation of transformative systems with OS support on SMP computer
Proceedings of the 2nd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Experimental analysis of approximation algorithms for the vertex cover and set covering problems
Computers and Operations Research
High-Performance Buffer Mapping to Exploit DRAM Concurrency in Multiprocessor DSP Systems
RSP '09 Proceedings of the 2009 IEEE/IFIP International Symposium on Rapid System Prototyping
Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
High-performance and low-energy buffer mapping method for multiprocessor DSP systems
ACM Transactions on Embedded Computing Systems (TECS)
Hi-index | 35.68 |
This paper is concerned with multiprocessor implementations of embedded applications specified as iterative dataflow programs in which synchronization overhead can be significant. We develop techniques to alleviate this overhead by determining a minimal set of processor synchronizations that are essential for correct execution. Our study is based in the context of self-timed execution of iterative dataflow programs. An iterative dataflow program consists of a dataflow representation of the body of a loop that is to be iterated an indefinite number of times; dataflow programming in this form has been studied and applied extensively, particularly in the context of signal processing software. Self-timed execution refers to a combined compile-time/run-time scheduling strategy in which processors synchronize with one another based only on interprocessor communication requirements, and thus, synchronization of processors at the end of each loop iteration does not generally occur. We introduce a new graph-theoretic framework based on a data structure called the synchronization graph for analyzing and optimizing synchronization overhead in self-timed, iterative dataflow programs. We show that the comprehensive techniques that have been developed for removing redundant synchronizations in noniterative programs can be extended in this framework to optimally remove redundant synchronizations in our context. We also present an optimization that converts a feedforward dataflow graph into a strongly connected graph in such a way as to reduce synchronization overhead without slowing down execution