Flexible filters in stream programs

Authors:
Rebecca L. Collins;Luca P. Carloni
Affiliations:
Columbia University;Columbia University
Venue:
ACM Transactions on Embedded Computing Systems (TECS)
Year:
2013

Citing 32
Cited 0

The implementation of the Cilk-5 multithreaded language

PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
TGFF: task graphs for free

Proceedings of the 6th international workshop on Hardware/software codesign
Cluster I/O with River: making the fast case common

Proceedings of the sixth workshop on I/O in parallel and distributed systems
Digital Image Processing

Digital Image Processing
TelegraphCQ: continuous dataflow processing

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Programmable Stream Processors

Computer
Task-level timing models for guaranteed performance in multiprocessor networks-on-chip

Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
Brook for GPUs: stream computing on graphics hardware

ACM SIGGRAPH 2004 Papers
Multiprocessor Resource Allocation for Hard-Real-Time Streaming with a Dynamic Job-Mix

RTAS '05 Proceedings of the 11th IEEE Real Time on Embedded Technology and Applications Symposium
Dynamic Load Distribution in the Borealis Stream Processor

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
A reconfigurable architecture for load-balanced rendering

Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware
Stream Programming on General-Purpose Processors

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Exploring trade-offs in buffer requirements and throughput constraints for synchronous dataflow graphs

Proceedings of the 43rd annual Design Automation Conference
Introduction to the cell multiprocessor

IBM Journal of Research and Development - POWER5 and packaging
Cell Multiprocessor Communication Network: Built for Speed

IEEE Micro
Exploiting coarse-grained task, data, and pipeline parallelism in stream programs

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Modelling run-time arbitration by latency-rate servers in dataflow graphs

SCOPES '07 Proceedingsof the 10th international workshop on Software & compilers for embedded systems
Evaluating MapReduce for Multi-core and Multiprocessor Systems

HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
Dynamic load balancing for distributed search

HPDC '05 Proceedings of the High Performance Distributed Computing, 2005. HPDC-14. Proceedings. 14th IEEE International Symposium
Cell/B.E. blades: building blocks for scalable, real-time, interactive, and digital media servers

IBM Journal of Research and Development
Orchestrating the execution of stream programs on multicore platforms

Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
SPADE: the system s declarative stream processing engine

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Caml trading – experiences with functional programming on wall street

Journal of Functional Programming
A lightweight streaming layer for multicore execution

ACM SIGARCH Computer Architecture News
The PARSEC benchmark suite: characterization and architectural implications

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Throughput Constraint for Synchronous Data Flow Graphs

CPAIOR '09 Proceedings of the 6th International Conference on Integration of AI and OR Techniques in Constraint Programming for Combinatorial Optimization Problems
Flexible filters: load balancing through backpressure for stream programs

EMSOFT '09 Proceedings of the seventh ACM international conference on Embedded software
Dataflow models for shared memory access latency analysis

EMSOFT '09 Proceedings of the seventh ACM international conference on Embedded software
Flextream: Adaptive Compilation of Streaming Applications for Heterogeneous Architectures

PACT '09 Proceedings of the 2009 18th International Conference on Parallel Architectures and Compilation Techniques
Optimizing the use of static buffers for DMA on a CELL chip

LCPC'06 Proceedings of the 19th international conference on Languages and compilers for parallel computing
An efficient and complete approach for throughput-maximal SDF allocation and scheduling on multi-core platforms

Proceedings of the Conference on Design, Automation and Test in Europe
Faster maximum and minimum mean cycle algorithms for system-performance analysis

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

The stream-processing model is a natural fit for multicore systems because it exposes the inherent locality and concurrency of a program and highlights its separable tasks for efficient parallel implementations. We present flexible filters, a load-balancing optimization technique for stream programs. Flexible filters utilize the programmability of the cores in order to improve the data-processing throughput of individual bottleneck tasks by “borrowing” resources from neighbors in the stream. Our technique is distributed and scalable because all runtime load-balancing decisions are based on point-to-point handshake signals exchanged between neighboring cores. Load balancing with flexible filters increases the system-level processing throughput of stream applications, particularly those with large dynamic variations in the computational load of their tasks. We empirically evaluate flexible filters in a homogeneous multicore environment over a suite of five real-word stream programs.