Compile-Time Scheduling and Assignment of Data-Flow Program Graphs with Data-Dependent Iteration
IEEE Transactions on Computers
Software Synthesis from Dataflow Graphs
Software Synthesis from Dataflow Graphs
A stream compiler for communication-exposed architectures
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Viper: A Multiprocessor SOC for Advanced Set-Top Box and Digital TV Systems
IEEE Design & Test
StreamIt: A Language for Streaming Applications
CC '02 Proceedings of the 11th International Conference on Compiler Construction
Power Efficient Processor Architecture and The Cell Processor
HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Queue - Multiprocessors
Heterogeneous Chip Multiprocessors
Computer
Data and Computation Transformations for Brook Streaming Applications on Multiprocessors
Proceedings of the International Symposium on Code Generation and Optimization
Dynamic thread assignment on heterogeneous multiprocessor architectures
Proceedings of the 3rd conference on Computing frontiers
Exploiting coarse-grained task, data, and pipeline parallelism in stream programs
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Orchestrating the execution of stream programs on multicore platforms
Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
A workload-aware mapping approach for data-parallel programs
Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers
Orchestration by approximation: mapping stream programs onto multicore architectures
Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
Buffer sizing for self-timed stream programs on heterogeneous distributed memory multiprocessors
HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
Profile-guided deployment of stream programs on multicores
Proceedings of the 13th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, Tools and Theory for Embedded Systems
StreamPI: a stream-parallel programming extension for object-oriented programming languages
The Journal of Supercomputing
Kernel Partitioning of Streaming Applications: A Statistical Approach to an NP-complete Problem
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Orchestrating stream graphs using model checking
ACM Transactions on Architecture and Code Optimization (TACO)
Hi-index | 0.00 |
This paper presents a partitioning and allocation algorithm for an iterative stream compiler, targeting heterogeneous multiprocessors with constrained distributed memory and any communications topology. We introduce a novel definition of connectedness that enables the algorithm to model the capabilities of the compiler. The algorithm uses convexity and connectedness constraints to produce partitions that are easier to compile and require short pipelines. Software pipelining is an effective transformation, but it increases memory footprint and latency, and has a startup overhead. Our algorithm takes account of these downstream costs. We show results for the StreamIt 2.1.1 benchmarks for an SMP, 2*2 mesh, SMP plus accelerator, and IBM QS20 blade, which has two Cell processors. Our results show that the average performance is within 5% of the unrestricted optimum found using a brute force search, while seldom requiring software pipelining. The heuristic is robust, and fast enough to be inside the feedback loop of an iterative compiler.