Optimizing synchronization in multiprocessor DSP systems

Authors:
S.S. Bhattacharyya;S. Sriram;E.A. Lee
Affiliations:
Semicond. Res. Lab., Hitachi America Ltd., San Jose, CA;-;-
Venue:
IEEE Transactions on Signal Processing
Year:
1997

Citing 0
Cited 5

Fast co-simulation of transformative systems with OS support on SMP computer

Proceedings of the 2nd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Experimental analysis of approximation algorithms for the vertex cover and set covering problems

Computers and Operations Research
High-Performance Buffer Mapping to Exploit DRAM Concurrency in Multiprocessor DSP Systems

RSP '09 Proceedings of the 2009 IEEE/IFIP International Symposium on Rapid System Prototyping
Buffer-space efficient and deadlock-free scheduling of stream applications on multi-core architectures

Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
High-performance and low-energy buffer mapping method for multiprocessor DSP systems

ACM Transactions on Embedded Computing Systems (TECS)

Quantified Score

Hi-index	35.68

Visualization

Abstract

This paper is concerned with multiprocessor implementations of embedded applications specified as iterative dataflow programs in which synchronization overhead can be significant. We develop techniques to alleviate this overhead by determining a minimal set of processor synchronizations that are essential for correct execution. Our study is based in the context of self-timed execution of iterative dataflow programs. An iterative dataflow program consists of a dataflow representation of the body of a loop that is to be iterated an indefinite number of times; dataflow programming in this form has been studied and applied extensively, particularly in the context of signal processing software. Self-timed execution refers to a combined compile-time/run-time scheduling strategy in which processors synchronize with one another based only on interprocessor communication requirements, and thus, synchronization of processors at the end of each loop iteration does not generally occur. We introduce a new graph-theoretic framework based on a data structure called the synchronization graph for analyzing and optimizing synchronization overhead in self-timed, iterative dataflow programs. We show that the comprehensive techniques that have been developed for removing redundant synchronizations in noniterative programs can be extended in this framework to optimally remove redundant synchronizations in our context. We also present an optimization that converts a feedforward dataflow graph into a strongly connected graph in such a way as to reduce synchronization overhead without slowing down execution