Static scheduling of synchronous data flow programs for digital signal processing
IEEE Transactions on Computers
Synthesis of Embedded Software from Synchronous Dataflow Specifications
Journal of VLSI Signal Processing Systems
A stream compiler for communication-exposed architectures
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
StreamIt: A Language for Streaming Applications
CC '02 Proceedings of the 11th International Conference on Compiler Construction
Evaluation of the Raw Microprocessor: An Exposed-Wire-Delay Architecture for ILP and Streams
Proceedings of the 31st annual international symposium on Computer architecture
Power Efficient Processor Architecture and The Cell Processor
HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Exploiting coarse-grained task, data, and pipeline parallelism in stream programs
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Synthesis of an application-specific soft multiprocessor system
Proceedings of the 2007 ACM/SIGDA 15th international symposium on Field programmable gate arrays
From single core to multi-core: preparing for a new exponential
Proceedings of the 2006 IEEE/ACM international conference on Computer-aided design
Versatile task assignment for heterogeneous soft dual-processor platforms
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Hi-index | 0.00 |
Pipelined execution of streaming applications enable processing of high-throughput data under performance constraint. We present an integrated approach to synthesizing pipelined software for dual-core architectures. We target streaming applications modeled as task graphs that are amenable to static analysis. We develop a versatile task assignment algorithm that considers the combined effect of workload imbalance between processors and inter-processor communication. Our technique, which runs in pseudo-linear time, provably maximizes application throughput. Furthermore, we develop an approximation algorithm for task assignment whose complexity is strictly polynomial. It provides the designer with an adjustable knob to controllably trade solution quality with algorithm runtime and memory requirement. Empirical throughput measurements using an FPGA-based dual-core system validate our theoretical results. Our exact algorithm consistently outperforms a recent competitor. Compared to exact task assignment, the approximate method runs about 3 times faster, requires about 20 times less memory, and results in only 1% to 5% throughput loss.