Iterative modulo scheduling: an algorithm for software pipelining loops
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Software pipelining showdown: optimal vs. heuristic methods in a production compiler
PLDI '96 Proceedings of the ACM SIGPLAN 1996 conference on Programming language design and implementation
Software Synthesis from Dataflow Graphs
Software Synthesis from Dataflow Graphs
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Proceedings of the 4th ACM international conference on Embedded software
Dynamic partitioning of processing and memory resources in embedded MPSoC architectures
Proceedings of the conference on Design, automation and test in Europe: Proceedings
Introduction to the cell multiprocessor
IBM Journal of Research and Development - POWER5 and packaging
Exploiting coarse-grained task, data, and pipeline parallelism in stream programs
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Deterministic parallel processing
International Journal of Parallel Programming
Beyond single-appearance schedules: Efficient DSP software synthesis using nested procedure calls
ACM Transactions on Embedded Computing Systems (TECS) - SPECIAL ISSUE SCOPES 2005
Orchestrating the execution of stream programs on multicore platforms
Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
Throughput-Buffering Trade-Off Exploration for Cyclo-Static and Synchronous Dataflow Graphs
IEEE Transactions on Computers
Flextream: Adaptive Compilation of Streaming Applications for Heterogeneous Architectures
PACT '09 Proceedings of the 2009 18th International Conference on Parallel Architectures and Compilation Techniques
A framework for automatic parallelization, static and dynamic memory optimization in MPSoC platforms
Proceedings of the 47th Design Automation Conference
Simultaneous budget and buffer size computation for throughput-constrained task graphs
Proceedings of the Conference on Design, Automation and Test in Europe
Hi-index | 0.00 |
We study the trade-off between throughput and memory footprint of embedded software that is synthesized from acyclic static dataflow (task graph) specifications targeting distributed memory multiprocessors. We identify iteration overlapping as a knob in the synthesis process by which one can trade application throughput for its memory requirement. Given an initial processor assignment and non-overlapped task schedule, we formally present underlying properties of the problem, such as constraints on a valid iteration overlapping, maximum possible throughput, and minimum memory footprint. Moreover, we develop an effective algorithm for generation of a rich set of design points that provide a range of trade-off options. Experimental results on a number of applications and architectures validate the effectiveness of our approach.