The implementation of the Cilk-5 multithreaded language
PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Proceedings of the 6th international workshop on Hardware/software codesign
Cluster I/O with River: making the fast case common
Proceedings of the sixth workshop on I/O in parallel and distributed systems
Digital Image Processing
TelegraphCQ: continuous dataflow processing
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Programmable Stream Processors
Computer
Task-level timing models for guaranteed performance in multiprocessor networks-on-chip
Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
Brook for GPUs: stream computing on graphics hardware
ACM SIGGRAPH 2004 Papers
Multiprocessor Resource Allocation for Hard-Real-Time Streaming with a Dynamic Job-Mix
RTAS '05 Proceedings of the 11th IEEE Real Time on Embedded Technology and Applications Symposium
Dynamic Load Distribution in the Borealis Stream Processor
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
A reconfigurable architecture for load-balanced rendering
Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware
Stream Programming on General-Purpose Processors
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the 43rd annual Design Automation Conference
Introduction to the cell multiprocessor
IBM Journal of Research and Development - POWER5 and packaging
Exploiting coarse-grained task, data, and pipeline parallelism in stream programs
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Modelling run-time arbitration by latency-rate servers in dataflow graphs
SCOPES '07 Proceedingsof the 10th international workshop on Software & compilers for embedded systems
Evaluating MapReduce for Multi-core and Multiprocessor Systems
HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
Dynamic load balancing for distributed search
HPDC '05 Proceedings of the High Performance Distributed Computing, 2005. HPDC-14. Proceedings. 14th IEEE International Symposium
Cell/B.E. blades: building blocks for scalable, real-time, interactive, and digital media servers
IBM Journal of Research and Development
Orchestrating the execution of stream programs on multicore platforms
Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
SPADE: the system s declarative stream processing engine
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Caml trading – experiences with functional programming on wall street
Journal of Functional Programming
A lightweight streaming layer for multicore execution
ACM SIGARCH Computer Architecture News
The PARSEC benchmark suite: characterization and architectural implications
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Throughput Constraint for Synchronous Data Flow Graphs
CPAIOR '09 Proceedings of the 6th International Conference on Integration of AI and OR Techniques in Constraint Programming for Combinatorial Optimization Problems
Flexible filters: load balancing through backpressure for stream programs
EMSOFT '09 Proceedings of the seventh ACM international conference on Embedded software
Dataflow models for shared memory access latency analysis
EMSOFT '09 Proceedings of the seventh ACM international conference on Embedded software
Flextream: Adaptive Compilation of Streaming Applications for Heterogeneous Architectures
PACT '09 Proceedings of the 2009 18th International Conference on Parallel Architectures and Compilation Techniques
Optimizing the use of static buffers for DMA on a CELL chip
LCPC'06 Proceedings of the 19th international conference on Languages and compilers for parallel computing
Proceedings of the Conference on Design, Automation and Test in Europe
Faster maximum and minimum mean cycle algorithms for system-performance analysis
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Hi-index | 0.00 |
The stream-processing model is a natural fit for multicore systems because it exposes the inherent locality and concurrency of a program and highlights its separable tasks for efficient parallel implementations. We present flexible filters, a load-balancing optimization technique for stream programs. Flexible filters utilize the programmability of the cores in order to improve the data-processing throughput of individual bottleneck tasks by “borrowing” resources from neighbors in the stream. Our technique is distributed and scalable because all runtime load-balancing decisions are based on point-to-point handshake signals exchanged between neighboring cores. Load balancing with flexible filters increases the system-level processing throughput of stream applications, particularly those with large dynamic variations in the computational load of their tasks. We empirically evaluate flexible filters in a homogeneous multicore environment over a suite of five real-word stream programs.