Static scheduling of synchronous data flow programs for digital signal processing
IEEE Transactions on Computers
Communicating process architecture: transputers and Occam
Proc. of an advanced course on Future parallel computers.
Static Rate-Optimal Scheduling of Iterative Data-Flow Programs Via Optimum Unfolding
IEEE Transactions on Computers
IEEE Transactions on Parallel and Distributed Systems
Partitioning and pipelining for performance-constrained hardware/software systems
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Static scheduling algorithms for allocating directed task graphs to multiprocessors
ACM Computing Surveys (CSUR)
Introduction to the Theory of Computation
Introduction to the Theory of Computation
Hardware-Software partitioning and pipelined scheduling of transformative applications
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
A stream compiler for communication-exposed architectures
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Scheduling Data-Flow Graphs via Retiming and Unfolding
IEEE Transactions on Parallel and Distributed Systems
StreamIt: A Language for Streaming Applications
CC '02 Proceedings of the 11th International Conference on Compiler Construction
Phased scheduling of stream programs
Proceedings of the 2003 ACM SIGPLAN conference on Language, compiler, and tool for embedded systems
Linear analysis and optimization of stream programs
PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Cg: a system for programming graphics hardware in a C-like language
ACM SIGGRAPH 2003 Papers
A Hierarchical Multiprocessor Scheduling Framework for
A Hierarchical Multiprocessor Scheduling Framework for
Programmable Stream Processors
Computer
Spidle: a DSL approach to specifying streaming applications
Proceedings of the 2nd international conference on Generative programming and component engineering
Evaluation of the Raw Microprocessor: An Exposed-Wire-Delay Architecture for ILP and Streams
Proceedings of the 31st annual international symposium on Computer architecture
Brook for GPUs: stream computing on graphics hardware
ACM SIGGRAPH 2004 Papers
Power Efficient Processor Architecture and The Cell Processor
HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Cache aware optimization of stream programs
LCTES '05 Proceedings of the 2005 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Optimizing stream programs using linear state space analysis
Proceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems
Automatic Thread Extraction with Decoupled Software Pipelining
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Stream Programming on General-Purpose Processors
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Data and Computation Transformations for Brook Streaming Applications on Multiprocessors
Proceedings of the International Symposium on Code Generation and Optimization
IEEE Micro
MPEG-2 decoding in a stream programming language
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
A programming model for an embedded media processing architecture
SAMOS'05 Proceedings of the 5th international conference on Embedded Computer Systems: architectures, Modeling, and Simulation
Program mapping onto network processors by recursive bipartitioning and refining
Proceedings of the 44th annual Design Automation Conference
Hierarchical coarse-grained stream compilation for software defined radio
CASES '07 Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems
Automated task distribution in multicore network processors using statistical analysis
Proceedings of the 3rd ACM/IEEE Symposium on Architecture for networking and communications systems
Code-size conscious pipelining of imperfectly nested loops
MEDEA '07 Proceedings of the 2007 workshop on MEmory performance: DEaling with Applications, systems and architecture
Optimistic parallelism benefits from data partitioning
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Streamware: programming general-purpose multicore processors using streams
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Software engineering for multicore systems: an experience report
Proceedings of the 1st international workshop on Multicore software engineering
Orchestrating the execution of stream programs on multicore platforms
Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
Design and evaluation of a compiler for embedded stream programs
Proceedings of the 2008 ACM SIGPLAN-SIGBED conference on Languages, compilers, and tools for embedded systems
Optimizing scientific application loops on stream processors
Proceedings of the 2008 ACM SIGPLAN-SIGBED conference on Languages, compilers, and tools for embedded systems
Harmony: an execution model and runtime for heterogeneous many core systems
HPDC '08 Proceedings of the 17th international symposium on High performance distributed computing
A lightweight streaming layer for multicore execution
ACM SIGARCH Computer Architecture News
Exact and approximate task assignment algorithms for pipelined software synthesis
Proceedings of the conference on Design, automation and test in Europe
IWOMP '07 Proceedings of the 3rd international workshop on OpenMP: A Practical Programming Model for the Multi-Core Era
SAMOS '08 Proceedings of the 8th international workshop on Embedded Computer Systems: Architectures, Modeling, and Simulation
Liquid Metal: Object-Oriented Programming Across the Hardware/Software Boundary
ECOOP '08 Proceedings of the 22nd European conference on Object-Oriented Programming
Stream Scheduling: A Framework to Manage Bulk Operations in Memory Hierarchies
Euro-Par '08 Proceedings of the 14th international Euro-Par conference on Parallel Processing
Energy efficient streaming applications with guaranteed throughput on MPSoCs
EMSOFT '08 Proceedings of the 8th ACM international conference on Embedded software
Optimus: efficient realization of streaming applications on FPGAs
CASES '08 Proceedings of the 2008 international conference on Compilers, architectures and synthesis for embedded systems
Throughput-driven synthesis of embedded software for pipelined execution on multicore architectures
ACM Transactions on Embedded Computing Systems (TECS)
Languages and Compilers for Parallel Computing
Matrix-based streamization approach for improving locality and parallelism on FT64 stream processor
The Journal of Supercomputing
A comparison of programming models for multiprocessors with explicitly managed memory hierarchies
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
MPSoC Design Using Application-Specific Architecturally Visible Communication
HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
Mapping and Synchronizing Streaming Applications on Cell Processors
HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
Diastolic arrays: throughput-driven reconfigurable computing
Proceedings of the 2008 IEEE/ACM International Conference on Computer-Aided Design
Partial order method for timed simulation of system-level MPSoC designs
Proceedings of the 2009 Asia and South Pacific Design Automation Conference
Prototyping pipelined applications on a heterogeneous FPGA multiprocessor virtual platform
Proceedings of the 2009 Asia and South Pacific Design Automation Conference
Fast and accurate performance simulation of embedded software for MPSoC
Proceedings of the 2009 Asia and South Pacific Design Automation Conference
The Complexity of Predicting Atomicity Violations
TACAS '09 Proceedings of the 15th International Conference on Tools and Algorithms for the Construction and Analysis of Systems: Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2009,
Stream processing for fast and efficient rotated Haar-like features using rotated integral images
International Journal of Intelligent Systems Technologies and Applications
SLIPstream: scalable low-latency interactive perception on streaming data
Proceedings of the 18th international workshop on Network and operating systems support for digital audio and video
Synergistic execution of stream programs on multicores with accelerators
Proceedings of the 2009 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Software Pipelined Execution of Stream Programs on GPUs
Proceedings of the 7th annual IEEE/ACM International Symposium on Code Generation and Optimization
Stream Compilation for Real-Time Embedded Multicore Systems
Proceedings of the 7th annual IEEE/ACM International Symposium on Code Generation and Optimization
A domain-specific approach for software development on Manycore platforms
ACM SIGARCH Computer Architecture News
Proceedings of the 11th Annual conference on Genetic and evolutionary computation
Multicore Scheduling for Lightweight Communicating Processes
COORDINATION '09 Proceedings of the 11th International Conference on Coordination Models and Languages
Beyond nested parallelism: tight bounds on work-stealing overheads for parallel futures
Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures
Partial memoization of concurrency and communication
Proceedings of the 14th ACM SIGPLAN international conference on Functional programming
SAMOS '09 Proceedings of the 9th International Workshop on Embedded Computer Systems: Architectures, Modeling, and Simulation
Journal of Signal Processing Systems
XJava: Exploiting Parallelism with Object-Oriented Stream Programming
Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
Flexible filters: load balancing through backpressure for stream programs
EMSOFT '09 Proceedings of the seventh ACM international conference on Embedded software
Mapping stream programs onto heterogeneous multiprocessor systems
CASES '09 Proceedings of the 2009 international conference on Compilers, architecture, and synthesis for embedded systems
A platform for developing adaptable multicore applications
CASES '09 Proceedings of the 2009 international conference on Compilers, architecture, and synthesis for embedded systems
CODES+ISSS '09 Proceedings of the 7th IEEE/ACM international conference on Hardware/software codesign and system synthesis
A computing origami: folding streams in FPGAs
Proceedings of the 46th Annual Design Automation Conference
PDCS '07 Proceedings of the 19th IASTED International Conference on Parallel and Distributed Computing and Systems
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Input-driven dynamic execution prediction of streaming applications
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
MacroSS: macro-SIMDization of streaming applications
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
An analytical model to exploit memory task scheduling
Proceedings of the 2010 Workshop on Interaction between Compilers and Computer Architecture
Minimizing communication in rate-optimal software pipelining for stream programs
Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization
A streaming machine description and programming model
SAMOS'07 Proceedings of the 7th international conference on Embedded computer systems: architectures, modeling, and simulation
Proceedings of the 7th ACM international conference on Computing frontiers
Automatic contention detection and amelioration for data-intensive operations
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
Manycore performance analysis using timed configuration graphs
SAMOS'09 Proceedings of the 9th international conference on Systems, architectures, modeling and simulation
Data marshaling for multi-core architectures
Proceedings of the 37th annual international symposium on Computer architecture
Versatile task assignment for heterogeneous soft dual-processor platforms
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Adaptive locks: Combining transactions and locks for efficient concurrency
Journal of Parallel and Distributed Computing
Feedback-directed pipeline parallelism
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Partitioning streaming parallelism for multi-cores: a machine learning based approach
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Lime: a Java-compatible and synthesizable language for heterogeneous architectures
Proceedings of the ACM international conference on Object oriented programming systems languages and applications
Skewed pipelining for parallel simulink simulations
Proceedings of the Conference on Design, Automation and Test in Europe
Proceedings of the Conference on Design, Automation and Test in Europe
Resource recycling: putting idle resources to work on a composable accelerator
CASES '10 Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
A language-based tuning mechanism for task and pipeline parallelism
Euro-Par'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part II
A stream-computing extension to OpenMP
Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers
Orchestration by approximation: mapping stream programs onto multicore architectures
Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
Sponge: portable stream programming on graphics engines
Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
Parallelization libraries: Characterizing and reducing overheads
ACM Transactions on Architecture and Code Optimization (TACO)
Scheduling of stream-based real-time applications for heterogeneous systems
Proceedings of the 2011 SIGPLAN/SIGBED conference on Languages, compilers and tools for embedded systems
Analysis and performance results of computing betweenness centrality on IBM Cyclops64
The Journal of Supercomputing
The tao of parallelism in algorithms
Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
Flow: A Stream Processing System Simulator
PADS '10 Proceedings of the 2010 IEEE Workshop on Principles of Advanced and Distributed Simulation
Evaluation of dynamic voltage and frequency scaling for stream programs
Proceedings of the 8th ACM International Conference on Computing Frontiers
Proceedings of the 48th Design Automation Conference
Task ordering and memory management problem for degree of parallelism estimation
COCOON'11 Proceedings of the 17th annual international conference on Computing and combinatorics
A monad for deterministic parallelism
Proceedings of the 4th ACM symposium on Haskell
Adaptive parallel approximate similarity search for responsive multimedia retrieval
Proceedings of the 20th ACM international conference on Information and knowledge management
ICA3PP'11 Proceedings of the 11th international conference on Algorithms and architectures for parallel processing - Volume Part II
PLDS: Partitioning linked data structures for parallelism
ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Scalable framework for mapping streaming applications onto multi-GPU systems
Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Multicore scheduling for lightweight communicating processes
Science of Computer Programming
Buffer sizing for self-timed stream programs on heterogeneous distributed memory multiprocessors
HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
Single thread program parallelism with dataflow abstracting thread
ICA3PP'10 Proceedings of the 10th international conference on Algorithms and Architectures for Parallel Processing - Volume Part II
Characteristics of workloads using the pipeline programming model
ISCA'10 Proceedings of the 2010 international conference on Computer Architecture
Adaptive task duplication using on-line bottleneck detection for streaming applications
Proceedings of the 9th conference on Computing Frontiers
ACM Transactions on Embedded Computing Systems (TECS)
Unrolling and retiming of stream applications onto embedded multicore processors
Proceedings of the 49th Annual Design Automation Conference
StreamX10: a stream programming framework on X10
Proceedings of the 2012 ACM SIGPLAN X10 Workshop
Profile-guided deployment of stream programs on multicores
Proceedings of the 13th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, Tools and Theory for Embedded Systems
Compiling a high-level language for GPUs: (via language support for architectures and compilers)
Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
Adaptive input-aware compilation for graphics engines
Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
From a calculus to an execution environment for stream processing
Proceedings of the 6th ACM International Conference on Distributed Event-Based Systems
StreamPI: a stream-parallel programming extension for object-oriented programming languages
The Journal of Supercomputing
Integrating Memory Optimization with Mapping Algorithms for Multi-Processors System-on-Chip
ACM Transactions on Embedded Computing Systems (TECS)
Avalanche: a fine-grained flow graph model for irregular applications on distributed-memory systems
Proceedings of the 1st ACM SIGPLAN workshop on Functional high-performance computing
Usage of petri nets for high performance computing
Proceedings of the 1st ACM SIGPLAN workshop on Functional high-performance computing
PQL: a purely-declarative java extension for parallel programming
ECOOP'12 Proceedings of the 26th European conference on Object-Oriented Programming
Auto-parallelizing stateful distributed streaming applications
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Coalition threading: combining traditional andnon-traditional parallelism to maximize scalability
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Programming parallelism with futures in lustre
Proceedings of the tenth ACM international conference on Embedded software
Automatic extraction of multi-objective aware pipeline parallelism using genetic algorithms
Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Integrating task parallelism with actors
Proceedings of the ACM international conference on Object oriented programming systems languages and applications
Automatic generation of software pipelines for heterogeneous parallel systems
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
OpenStream: Expressiveness and data-flow compilation of OpenMP streaming programs
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Sigma*: symbolic learning of input-output specifications
POPL '13 Proceedings of the 40th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Mapping of streaming applications considering alternative application specifications
ACM Transactions on Embedded Computing Systems (TECS) - Special section on ESTIMedia'12, LCTES'11, rigorous embedded systems design, and multiprocessor system-on-chip for cyber-physical systems
MultiMaKe: Chip-multiprocessor driven memory-aware kernel pipelining
ACM Transactions on Embedded Computing Systems (TECS) - Special section on ESTIMedia'12, LCTES'11, rigorous embedded systems design, and multiprocessor system-on-chip for cyber-physical systems
StreamTMC: Stream compilation for tiled multi-core architectures
Journal of Parallel and Distributed Computing
Optimizing tensor contraction expressions for hybrid CPU-GPU execution
Cluster Computing
Kernel Partitioning of Streaming Applications: A Statistical Approach to an NP-complete Problem
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Scheduling linear chain streaming applications on heterogeneous systems with failures
Future Generation Computer Systems
Pipelining for cyclic control systems
Proceedings of the 16th international conference on Hybrid systems: computation and control
Navigating big data with high-throughput, energy-efficient data partitioning
Proceedings of the 40th Annual International Symposium on Computer Architecture
On-the-fly pipeline parallelism
Proceedings of the twenty-fifth annual ACM symposium on Parallelism in algorithms and architectures
Dynamic expressivity with static optimization for streaming languages
Proceedings of the 7th ACM international conference on Distributed event-based systems
Tutorial: stream processing optimizations
Proceedings of the 7th ACM international conference on Distributed event-based systems
Optimizations for configuring and mapping software pipelines in many core systems
Proceedings of the 50th Annual Design Automation Conference
Exploiting just-enough parallelism when mapping streaming applications in hard real-time systems
Proceedings of the 50th Annual Design Automation Conference
Analysis of multi-domain scenarios for optimized dynamic power management strategies
DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
Combining module selection and replication for throughput-driven streaming programs
DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
Enhancing throughput for streaming applications running on cluster systems
Journal of Parallel and Distributed Computing
Orchestrating stream graphs using model checking
ACM Transactions on Architecture and Code Optimization (TACO)
Using machine learning to partition streaming programs
ACM Transactions on Architecture and Code Optimization (TACO)
Design-space exploration and runtime resource management for multicores
ACM Transactions on Embedded Computing Systems (TECS) - Special issue on application-specific processors
The shape of things to run: compiling complex stream graphs to reconfigurable hardware in lime
ECOOP'13 Proceedings of the 27th European conference on Object-Oriented Programming
A catalog of stream processing optimizations
ACM Computing Surveys (CSUR)
Flexible filters in stream programs
ACM Transactions on Embedded Computing Systems (TECS)
Throughput-memory footprint trade-off in synthesis of streaming software on embedded multiprocessors
ACM Transactions on Embedded Computing Systems (TECS)
Combining computation and communication optimizations in system synthesis for streaming applications
Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays
Well-structured futures and cache locality
Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming
Proceedings of the Ninth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis
Automatic extraction of pipeline parallelism for embedded heterogeneous multi-core platforms
Proceedings of the 2013 International Conference on Compilers, Architectures and Synthesis for Embedded Systems
Expandable process networks to efficiently specify and explore task, data, and pipeline parallelism
Proceedings of the 2013 International Conference on Compilers, Architectures and Synthesis for Embedded Systems
StreaMorph: a case for synthesizing energy-efficient adaptive programs using high-level abstractions
Proceedings of the Eleventh ACM International Conference on Embedded Software
Journal of Systems Architecture: the EUROMICRO Journal
IBM streams processing language: analyzing big data in motion
IBM Journal of Research and Development
Hi-index | 0.01 |
As multicore architectures enter the mainstream, there is a pressing demand for high-level programming models that can effectively map to them. Stream programming offers an attractive way to expose coarse-grained parallelism, as streaming applications (image, video, DSP, etc.) are naturally represented by independent filters that communicate over explicit data channels.In this paper, we demonstrate an end-to-end stream compiler that attains robust multicore performance in the face of varying application characteristics. As benchmarks exhibit different amounts of task, data, and pipeline parallelism, we exploit all types of parallelism in a unified manner in order to achieve this generality. Our compiler, which maps from the StreamIt language to the 16-core Raw architecture, attains a 11.2x mean speedup over a single-core baseline, and a 1.84x speedup over our previous work.