Fibonacci heaps and their uses in improved network optimization algorithms
Journal of the ACM (JACM)
Determining the minimum iteration period of an algorithm
Journal of VLSI Signal Processing Systems
Bounded scheduling of process networks
Bounded scheduling of process networks
A coupled hardware and software architecture for programmable digital signal processors (synchronous data flow)
Scheduling dynamic dataflow graphs with bounded memory using the token flow model
Scheduling dynamic dataflow graphs with bounded memory using the token flow model
Design and programming of embedded multiprocessors: an interface-centric approach
Proceedings of the 2nd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Power Efficient Processor Architecture and The Cell Processor
HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Queue - Multiprocessors
Dynamic thread assignment on heterogeneous multiprocessor architectures
Proceedings of the 3rd conference on Computing frontiers
Proceedings of the 43rd annual Design Automation Conference
Exploiting coarse-grained task, data, and pipeline parallelism in stream programs
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Computer Architecture, Fourth Edition: A Quantitative Approach
Computer Architecture, Fourth Edition: A Quantitative Approach
All-pairs bottleneck paths for general graphs in truly sub-cubic time
Proceedings of the thirty-ninth annual ACM symposium on Theory of computing
Orchestrating the execution of stream programs on multicore platforms
Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
Stream Compilation for Real-Time Embedded Multicore Systems
Proceedings of the 7th annual IEEE/ACM International Symposium on Code Generation and Optimization
SAMOS '09 Proceedings of the 9th International Workshop on Embedded Computer Systems: Architectures, Modeling, and Simulation
Mapping stream programs onto heterogeneous multiprocessor systems
CASES '09 Proceedings of the 2009 international conference on Compilers, architecture, and synthesis for embedded systems
Requirements on the execution of Kahn process networks
ESOP'03 Proceedings of the 12th European conference on Programming
Faster maximum and minimum mean cycle algorithms for system-performance analysis
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Optimizing explicit data transfers for data parallel applications on the cell architecture
ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Hi-index | 0.00 |
Stream programming is a promising way to expose concurrency to the compiler. A stream program is built from kernels that communicate only via point-to-point streams. The stream compiler statically allocates these kernels to processors, applying blocking, fission and fusion transformations. The compiler determines the sizes of the communication buffers, which affects performance since local memories can be small. In this paper, we propose a feedback-directed algorithm that determines the size of each communication buffer, based on i) the stream program that has been mapped onto processors, ii) feedback from an earlier execution, and iii) the memory constraints. The algorithm exposes a trade-off between throughput and latency. It is general, in that it applies to stream programs with unstructured stream graphs, and it supports variable execution times and communication rates. We show results for the StreamIt benchmarks and random graphs. For the StreamIt benchmarks, throughput is optimal after the first iteration. For random graphs with stochastic computation times, throughput is within 3% of optimal after four iterations. Compared with the previous general algorithm, by Basten and Hoogerbrugge, our algorithm has significantly better performance and latency.