Static scheduling of synchronous data flow programs for digital signal processing
IEEE Transactions on Computers
Static Rate-Optimal Scheduling of Iterative Data-Flow Programs Via Optimum Unfolding
IEEE Transactions on Computers
Compile-Time Scheduling and Assignment of Data-Flow Program Graphs with Data-Dependent Iteration
IEEE Transactions on Computers
Code generation schema for modulo scheduled loops
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Iterative modulo scheduling: an algorithm for software pipelining loops
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Modulo scheduling for a fully-distributed clustered VLIW architecture
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
A stream compiler for communication-exposed architectures
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
StreamIt: A Language for Streaming Applications
CC '02 Proceedings of the 11th International Conference on Compiler Construction
Cg: a system for programming graphics hardware in a C-like language
ACM SIGGRAPH 2003 Papers
Brook for GPUs: stream computing on graphics hardware
ACM SIGGRAPH 2004 Papers
Power Efficient Processor Architecture and The Cell Processor
HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Stream Programming on General-Purpose Processors
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Data and Computation Transformations for Brook Streaming Applications on Multiprocessors
Proceedings of the International Symposium on Code Generation and Optimization
Exploiting coarse-grained task, data, and pipeline parallelism in stream programs
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
CellSs: a programming model for the cell BE architecture
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Dynamic multigrain parallelization on the cell broadband engine
Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Compilation for explicitly managed memory hierarchies
Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
A programming model for an embedded media processing architecture
SAMOS'05 Proceedings of the 5th international conference on Embedded Computer Systems: architectures, Modeling, and Simulation
Optimus: efficient realization of streaming applications on FPGAs
CASES '08 Proceedings of the 2008 international conference on Compilers, architectures and synthesis for embedded systems
Copy or Discard execution model for speculative parallelization on multicores
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
From SODA to scotch: The evolution of a wireless baseband processor
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Scheduling dynamic parallelism on accelerators
Proceedings of the 6th ACM conference on Computing frontiers
Synergistic execution of stream programs on multicores with accelerators
Proceedings of the 2009 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Software Pipelined Execution of Stream Programs on GPUs
Proceedings of the 7th annual IEEE/ACM International Symposium on Code Generation and Optimization
Stream Compilation for Real-Time Embedded Multicore Systems
Proceedings of the 7th annual IEEE/ACM International Symposium on Code Generation and Optimization
Multicore Scheduling for Lightweight Communicating Processes
COORDINATION '09 Proceedings of the 11th International Conference on Coordination Models and Languages
Flexible filters: load balancing through backpressure for stream programs
EMSOFT '09 Proceedings of the seventh ACM international conference on Embedded software
Mapping stream programs onto heterogeneous multiprocessor systems
CASES '09 Proceedings of the 2009 international conference on Compilers, architecture, and synthesis for embedded systems
CODES+ISSS '09 Proceedings of the 7th IEEE/ACM international conference on Hardware/software codesign and system synthesis
A computing origami: folding streams in FPGAs
Proceedings of the 46th Annual Design Automation Conference
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Input-driven dynamic execution prediction of streaming applications
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Hardware/software partitioning and pipelined scheduling on runtime reconfigurable FPGAs
ACM Transactions on Design Automation of Electronic Systems (TODAES)
MacroSS: macro-SIMDization of streaming applications
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Speculative parallelization of sequential loops on multicores
International Journal of Parallel Programming
Minimizing communication in rate-optimal software pipelining for stream programs
Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization
Proceedings of the 7th ACM international conference on Computing frontiers
Bamboo: a data-centric, object-oriented approach to many-core software
PLDI '10 Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation
Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
Thread tailor: dynamically weaving threads together for efficient, adaptive parallel applications
Proceedings of the 37th annual international symposium on Computer architecture
Partitioning streaming parallelism for multi-cores: a machine learning based approach
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Compilation of stream programs for multicore processors that incorporate scratchpad memories
Proceedings of the Conference on Design, Automation and Test in Europe
Accelerating large-scale DEVS-based simulation on the cell processor
SpringSim '10 Proceedings of the 2010 Spring Simulation Multiconference
Resource recycling: putting idle resources to work on a composable accelerator
CASES '10 Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems
Distributed stream processing with DUP
NPC'10 Proceedings of the 2010 IFIP international conference on Network and parallel computing
Loop transformations: convexity, pruning and optimization
Proceedings of the 38th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Orchestration by approximation: mapping stream programs onto multicore architectures
Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
Sponge: portable stream programming on graphics engines
Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
Exploring Multi-Grained Parallelism in Compute-Intensive DEVS Simulations
PADS '10 Proceedings of the 2010 IEEE Workshop on Principles of Advanced and Distributed Simulation
Proceedings of the 48th Design Automation Conference
A constraint based approach to cyclic RCPSP
CP'11 Proceedings of the 17th international conference on Principles and practice of constraint programming
Optimizing modulo scheduling to achieve reuse and concurrency for stream processors
The Journal of Supercomputing
ACM Transactions on Architecture and Code Optimization (TACO)
Proceedings of the 2012 International Workshop on Programming Models and Applications for Multicores and Manycores
Scalable framework for mapping streaming applications onto multi-GPU systems
Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Mapping streaming languages to general purpose processors through vectorization
LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
Automatic data distribution for improving data locality on the cell BE architecture
LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
TransCom: transforming stream communication for load balance and efficiency in networks-on-chip
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Multicore scheduling for lightweight communicating processes
Science of Computer Programming
Buffer sizing for self-timed stream programs on heterogeneous distributed memory multiprocessors
HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
Characteristics of workloads using the pipeline programming model
ISCA'10 Proceedings of the 2010 international conference on Computer Architecture
Adaptive task duplication using on-line bottleneck detection for streaming applications
Proceedings of the 9th conference on Computing Frontiers
Unrolling and retiming of stream applications onto embedded multicore processors
Proceedings of the 49th Annual Design Automation Conference
StreamX10: a stream programming framework on X10
Proceedings of the 2012 ACM SIGPLAN X10 Workshop
Profile-guided deployment of stream programs on multicores
Proceedings of the 13th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, Tools and Theory for Embedded Systems
Adaptive input-aware compilation for graphics engines
Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
StreamPI: a stream-parallel programming extension for object-oriented programming languages
The Journal of Supercomputing
Global cyclic cumulative constraint
CPAIOR'12 Proceedings of the 9th international conference on Integration of AI and OR Techniques in Constraint Programming for Combinatorial Optimization Problems
Dynamic scheduling of stream programs on embedded multi-core processors
Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Sigma*: symbolic learning of input-output specifications
POPL '13 Proceedings of the 40th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Mapping of streaming applications considering alternative application specifications
ACM Transactions on Embedded Computing Systems (TECS) - Special section on ESTIMedia'12, LCTES'11, rigorous embedded systems design, and multiprocessor system-on-chip for cyber-physical systems
StreamTMC: Stream compilation for tiled multi-core architectures
Journal of Parallel and Distributed Computing
Kernel Partitioning of Streaming Applications: A Statistical Approach to an NP-complete Problem
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
A general constraint-centric scheduling framework for spatial architectures
Proceedings of the 34th ACM SIGPLAN conference on Programming language design and implementation
Exploiting just-enough parallelism when mapping streaming applications in hard real-time systems
Proceedings of the 50th Annual Design Automation Conference
Combining module selection and replication for throughput-driven streaming programs
DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
Orchestrating stream graphs using model checking
ACM Transactions on Architecture and Code Optimization (TACO)
Using machine learning to partition streaming programs
ACM Transactions on Architecture and Code Optimization (TACO)
DANBI: dynamic scheduling of irregular stream programs for many-core systems
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Transparent CPU-GPU collaboration for data-parallel kernels on heterogeneous systems
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Maximum-throughput mapping of SDFGs on multi-core SoC platforms
Journal of Parallel and Distributed Computing
Flexible filters in stream programs
ACM Transactions on Embedded Computing Systems (TECS)
Throughput-memory footprint trade-off in synthesis of streaming software on embedded multiprocessors
ACM Transactions on Embedded Computing Systems (TECS)
Combining computation and communication optimizations in system synthesis for streaming applications
Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays
StreaMorph: a case for synthesizing energy-efficient adaptive programs using high-level abstractions
Proceedings of the Eleventh ACM International Conference on Embedded Software
CROSS cyclic resource-constrained scheduling solver
Artificial Intelligence
Hi-index | 0.00 |
While multicore hardware has become ubiquitous, explicitly parallel programming models and compiler techniques for exploiting parallelism on these systems have noticeably lagged behind. Stream programming is one model that has wide applicability in the multimedia, graphics, and signal processing domains. Streaming models execute as a set of independent actors that explicitly communicate data through channels. This paper presents a compiler technique for planning and orchestrating the execution of streaming applications on multicore platforms. An integrated unfolding and partitioning step based on integer linear programming is presented that unfolds data parallel actors as needed and maximally packs actors onto cores. Next, the actors are assigned to pipeline stages in such a way that all communication is maximally overlapped with computation on the cores. To facilitate experimentation, a generalized code generation template for mapping the software pipeline onto the Cell architecture is presented. For a range of streaming applications, a geometric mean speedup of 14.7x is achieved on a 16-core Cell platform compared to a single core.