Profile-guided deployment of stream programs on multicores

Authors:
S. M. Farhad;Yousun Ko;Bernd Burgstaller;Bernhard Scholz
Affiliations:
The University of Sydney and NICTA;Yonsei University;Yonsei University;The University of Sydney
Venue:
Proceedings of the 13th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, Tools and Theory for Embedded Systems
Year:
2012

Citing 29
Cited 0

Static scheduling of synchronous data flow programs for digital signal processing

IEEE Transactions on Computers
A modeling language for mathematical programming

Management Science
Smallest-last ordering and clustering and graph coloring algorithms

Journal of the ACM (JACM)
Approximation algorithms

Approximation algorithms
Software Synthesis from Dataflow Graphs

Software Synthesis from Dataflow Graphs
Baring It All to Software: Raw Machines

Computer
First version of a data flow procedure language

Programming Symposium, Proceedings Colloque sur la Programmation
StreamIt: A Language for Streaming Applications

CC '02 Proceedings of the 11th International Conference on Compiler Construction
Phased scheduling of stream programs

Proceedings of the 2003 ACM SIGPLAN conference on Language, compiler, and tool for embedded systems
Cg: a system for programming graphics hardware in a C-like language

ACM SIGGRAPH 2003 Papers
Brook for GPUs: stream computing on graphics hardware

ACM SIGGRAPH 2004 Papers
Power Efficient Processor Architecture and The Cell Processor

HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Shangri-La: achieving high performance from compiled network applications while enabling ease of programming

Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Stream Programming on General-Purpose Processors

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Exploiting coarse-grained task, data, and pipeline parallelism in stream programs

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Can programming be liberated from the von Neumann style?: a functional style and its algebra of programs

ACM Turing award lectures
Streamflex: high-throughput stream programming in java

Proceedings of the 22nd annual ACM SIGPLAN conference on Object-oriented programming systems and applications
Orchestrating the execution of stream programs on multicore platforms

Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
A lightweight streaming layer for multicore execution

ACM SIGARCH Computer Architecture News
Synergistic execution of stream programs on multicores with accelerators

Proceedings of the 2009 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Software Pipelined Execution of Stream Programs on GPUs

Proceedings of the 7th annual IEEE/ACM International Symposium on Code Generation and Optimization
Mapping stream programs onto heterogeneous multiprocessor systems

CASES '09 Proceedings of the 2009 international conference on Compilers, architecture, and synthesis for embedded systems
Language and compiler support for stream programs

Language and compiler support for stream programs
Minimizing communication in rate-optimal software pipelining for stream programs

Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization
Computer Systems: A Programmer's Perspective

Computer Systems: A Programmer's Perspective
An empirical characterization of stream programs and its implications for language and compiler design

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
LIKWID: A Lightweight Performance-Oriented Tool Suite for x86 Multicore Environments

ICPPW '10 Proceedings of the 2010 39th International Conference on Parallel Processing Workshops
Orchestration by approximation: mapping stream programs onto multicore architectures

Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
A programming model for an embedded media processing architecture

SAMOS'05 Proceedings of the 5th international conference on Embedded Computer Systems: architectures, Modeling, and Simulation

Quantified Score

Hi-index	0.00

Visualization

Abstract

Because multicore architectures have become the industry standard, programming abstractions for concurrent programming are of key importance. Stream programming languages facilitate application domains characterized by regular sequences of data, such as multimedia, graphics, signal processing and networking. With stream programs, computations are expressed through independent actors that interact through FIFO data channels. A major challenge with stream programs is to load-balance actors among available processing cores. The workload of a stream program is determined by actor execution times and the communication overhead induced by data channels. Estimating communication costs on cache-coherent shared-memory multiprocessors is difficult, because data movements are abstracted away by the cache coherence protocol. Standard execution time profiling techniques cannot separate actor execution times from communication costs, because communication costs manifest in terms of execution time overhead. In this work we present a unified Integer Linear Programming (ILP) formulation that balances the workload of stream programs on cache-coherent multicore architectures. For estimating the communication costs of data channels, we devise a novel profiling scheme that minimizes the number of profiling steps. We conduct experiments across a range of StreamIt benchmarks and show that our method achieves a speedup of up to 4.02x on 6 processors. The number of profiling steps is on average only 17% of an exhaustive profiling run over all data channels of a stream program.