Orchestrating the execution of stream programs on multicore platforms

Authors:
Manjunath Kudlur;Scott Mahlke
Affiliations:
University of Michigan, Ann Arbor, MI, USA;University of Michigan, Ann Arbor, MI, USA
Venue:
Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
Year:
2008

Citing 21
Cited 70

Static scheduling of synchronous data flow programs for digital signal processing

IEEE Transactions on Computers
Static Rate-Optimal Scheduling of Iterative Data-Flow Programs Via Optimum Unfolding

IEEE Transactions on Computers
Compile-Time Scheduling and Assignment of Data-Flow Program Graphs with Data-Dependent Iteration

IEEE Transactions on Computers
Code generation schema for modulo scheduled loops

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Iterative modulo scheduling: an algorithm for software pipelining loops

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Modulo scheduling for a fully-distributed clustered VLIW architecture

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
A stream compiler for communication-exposed architectures

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
The Raw Microprocessor: A Computational Fabric for Software Circuits and General-Purpose Programs

IEEE Micro
StreamIt: A Language for Streaming Applications

CC '02 Proceedings of the 11th International Conference on Compiler Construction
Cg: a system for programming graphics hardware in a C-like language

ACM SIGGRAPH 2003 Papers
Brook for GPUs: stream computing on graphics hardware

ACM SIGGRAPH 2004 Papers
Power Efficient Processor Architecture and The Cell Processor

HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Shangri-La: achieving high performance from compiled network applications while enabling ease of programming

Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Niagara: A 32-Way Multithreaded Sparc Processor

IEEE Micro
Stream Programming on General-Purpose Processors

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Data and Computation Transformations for Brook Streaming Applications on Multiprocessors

Proceedings of the International Symposium on Code Generation and Optimization
Exploiting coarse-grained task, data, and pipeline parallelism in stream programs

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
CellSs: a programming model for the cell BE architecture

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Dynamic multigrain parallelization on the cell broadband engine

Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Compilation for explicitly managed memory hierarchies

Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
A programming model for an embedded media processing architecture

SAMOS'05 Proceedings of the 5th international conference on Embedded Computer Systems: architectures, Modeling, and Simulation

Optimus: efficient realization of streaming applications on FPGAs

CASES '08 Proceedings of the 2008 international conference on Compilers, architectures and synthesis for embedded systems
Copy or Discard execution model for speculative parallelization on multicores

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
From SODA to scotch: The evolution of a wireless baseband processor

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Scheduling dynamic parallelism on accelerators

Proceedings of the 6th ACM conference on Computing frontiers
Synergistic execution of stream programs on multicores with accelerators

Proceedings of the 2009 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Software Pipelined Execution of Stream Programs on GPUs

Proceedings of the 7th annual IEEE/ACM International Symposium on Code Generation and Optimization
Stream Compilation for Real-Time Embedded Multicore Systems

Proceedings of the 7th annual IEEE/ACM International Symposium on Code Generation and Optimization
Multicore Scheduling for Lightweight Communicating Processes

COORDINATION '09 Proceedings of the 11th International Conference on Coordination Models and Languages
Flexible filters: load balancing through backpressure for stream programs

EMSOFT '09 Proceedings of the seventh ACM international conference on Embedded software
Mapping stream programs onto heterogeneous multiprocessor systems

CASES '09 Proceedings of the 2009 international conference on Compilers, architecture, and synthesis for embedded systems
Mapping pipelined applications onto heterogeneous embedded systems: a bayesian optimization algorithm based approach

CODES+ISSS '09 Proceedings of the 7th IEEE/ACM international conference on Hardware/software codesign and system synthesis
A computing origami: folding streams in FPGAs

Proceedings of the 46th Annual Design Automation Conference
Polymorphic pipeline array: a flexible multicore accelerator with virtualized execution for mobile multimedia applications

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Input-driven dynamic execution prediction of streaming applications

Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Hardware/software partitioning and pipelined scheduling on runtime reconfigurable FPGAs

ACM Transactions on Design Automation of Electronic Systems (TODAES)
MacroSS: macro-SIMDization of streaming applications

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Speculative parallelization of sequential loops on multicores

International Journal of Parallel Programming
Minimizing communication in rate-optimal software pipelining for stream programs

Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization
ERBIUM: a deterministic, concurrent intermediate representation for portable and scalable performance

Proceedings of the 7th ACM international conference on Computing frontiers
Bamboo: a data-centric, object-oriented approach to many-core software

PLDI '10 Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation
Buffer-space efficient and deadlock-free scheduling of stream applications on multi-core architectures

Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
Thread tailor: dynamically weaving threads together for efficient, adaptive parallel applications

Proceedings of the 37th annual international symposium on Computer architecture
Partitioning streaming parallelism for multi-cores: a machine learning based approach

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
An empirical characterization of stream programs and its implications for language and compiler design

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Compilation of stream programs for multicore processors that incorporate scratchpad memories

Proceedings of the Conference on Design, Automation and Test in Europe
Accelerating large-scale DEVS-based simulation on the cell processor

SpringSim '10 Proceedings of the 2010 Spring Simulation Multiconference
Resource recycling: putting idle resources to work on a composable accelerator

CASES '10 Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems
Distributed stream processing with DUP

NPC'10 Proceedings of the 2010 IFIP international conference on Network and parallel computing
Loop transformations: convexity, pruning and optimization

Proceedings of the 38th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Orchestration by approximation: mapping stream programs onto multicore architectures

Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
Sponge: portable stream programming on graphics engines

Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
Exploring Multi-Grained Parallelism in Compute-Intensive DEVS Simulations

PADS '10 Proceedings of the 2010 IEEE Workshop on Principles of Advanced and Distributed Simulation
Compilation of stream programs onto scratchpad memory based embedded multicore processors through retiming

Proceedings of the 48th Design Automation Conference
A constraint based approach to cyclic RCPSP

CP'11 Proceedings of the 17th international conference on Principles and practice of constraint programming
Optimizing modulo scheduling to achieve reuse and concurrency for stream processors

The Journal of Supercomputing
Comparability Graph Coloring for Optimizing Utilization of Software-Managed Stream Register Files for Stream Processors

ACM Transactions on Architecture and Code Optimization (TACO)
Efficient execution of time-step computations with pipelined parallelism and inter-thread data locality optimizaitions

Proceedings of the 2012 International Workshop on Programming Models and Applications for Multicores and Manycores
Scalable framework for mapping streaming applications onto multi-GPU systems

Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Mapping streaming languages to general purpose processors through vectorization

LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
Automatic data distribution for improving data locality on the cell BE architecture

LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
TransCom: transforming stream communication for load balance and efficiency in networks-on-chip

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Multicore scheduling for lightweight communicating processes

Science of Computer Programming
Buffer sizing for self-timed stream programs on heterogeneous distributed memory multiprocessors

HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
Characteristics of workloads using the pipeline programming model

ISCA'10 Proceedings of the 2010 international conference on Computer Architecture
Adaptive task duplication using on-line bottleneck detection for streaming applications

Proceedings of the 9th conference on Computing Frontiers
Unrolling and retiming of stream applications onto embedded multicore processors

Proceedings of the 49th Annual Design Automation Conference
StreamX10: a stream programming framework on X10

Proceedings of the 2012 ACM SIGPLAN X10 Workshop
Profile-guided deployment of stream programs on multicores

Proceedings of the 13th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, Tools and Theory for Embedded Systems
Adaptive input-aware compilation for graphics engines

Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
Multicore acceleration of Discrete Event System Specification systems

Simulation
StreamPI: a stream-parallel programming extension for object-oriented programming languages

The Journal of Supercomputing
Global cyclic cumulative constraint

CPAIOR'12 Proceedings of the 9th international conference on Integration of AI and OR Techniques in Constraint Programming for Combinatorial Optimization Problems
Dynamic scheduling of stream programs on embedded multi-core processors

Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Sigma*: symbolic learning of input-output specifications

POPL '13 Proceedings of the 40th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Mapping of streaming applications considering alternative application specifications

ACM Transactions on Embedded Computing Systems (TECS) - Special section on ESTIMedia'12, LCTES'11, rigorous embedded systems design, and multiprocessor system-on-chip for cyber-physical systems
StreamTMC: Stream compilation for tiled multi-core architectures

Journal of Parallel and Distributed Computing
Kernel Partitioning of Streaming Applications: A Statistical Approach to an NP-complete Problem

MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
A general constraint-centric scheduling framework for spatial architectures

Proceedings of the 34th ACM SIGPLAN conference on Programming language design and implementation
Exploiting just-enough parallelism when mapping streaming applications in hard real-time systems

Proceedings of the 50th Annual Design Automation Conference
Combining module selection and replication for throughput-driven streaming programs

DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
Orchestrating stream graphs using model checking

ACM Transactions on Architecture and Code Optimization (TACO)
Using machine learning to partition streaming programs

ACM Transactions on Architecture and Code Optimization (TACO)
DANBI: dynamic scheduling of irregular stream programs for many-core systems

PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Transparent CPU-GPU collaboration for data-parallel kernels on heterogeneous systems

PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Maximum-throughput mapping of SDFGs on multi-core SoC platforms

Journal of Parallel and Distributed Computing
Flexible filters in stream programs

ACM Transactions on Embedded Computing Systems (TECS)
Throughput-memory footprint trade-off in synthesis of streaming software on embedded multiprocessors

ACM Transactions on Embedded Computing Systems (TECS)
Combining computation and communication optimizations in system synthesis for streaming applications

Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays
StreaMorph: a case for synthesizing energy-efficient adaptive programs using high-level abstractions

Proceedings of the Eleventh ACM International Conference on Embedded Software
CROSS cyclic resource-constrained scheduling solver

Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

While multicore hardware has become ubiquitous, explicitly parallel programming models and compiler techniques for exploiting parallelism on these systems have noticeably lagged behind. Stream programming is one model that has wide applicability in the multimedia, graphics, and signal processing domains. Streaming models execute as a set of independent actors that explicitly communicate data through channels. This paper presents a compiler technique for planning and orchestrating the execution of streaming applications on multicore platforms. An integrated unfolding and partitioning step based on integer linear programming is presented that unfolds data parallel actors as needed and maximally packs actors onto cores. Next, the actors are assigned to pipeline stages in such a way that all communication is maximally overlapped with computation on the cores. To facilitate experimentation, a generalized code generation template for mapping the software pipeline onto the Cell architecture is presented. For a range of streaming applications, a geometric mean speedup of 14.7x is achieved on a 16-core Cell platform compared to a single core.