Optimal latency-throughput tradeoffs for data parallel pipelines
Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures
Performance Evaluation of Systems Using Nets
Proceedings of the Advanced Course on General Net Theory of Processes and Systems: Net Theory and Applications
A coupled hardware and software architecture for programmable digital signal processors (synchronous data flow)
Merrimac: Supercomputing with Streams
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Stream Programming on General-Purpose Processors
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the 43rd annual Design Automation Conference
Compiling for stream processing
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Exploiting coarse-grained task, data, and pipeline parallelism in stream programs
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Performance Evaluation of Asynchronous Concurrent Systems Using Petri Nets
IEEE Transactions on Software Engineering
Streamware: programming general-purpose multicore processors using streams
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Orchestrating the execution of stream programs on multicore platforms
Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
Embedded Multiprocessors: Scheduling and Synchronization
Embedded Multiprocessors: Scheduling and Synchronization
Elastic scaling of data parallel operators in stream processing
IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
Flexible filters: load balancing through backpressure for stream programs
EMSOFT '09 Proceedings of the seventh ACM international conference on Embedded software
Flextream: Adaptive Compilation of Streaming Applications for Heterogeneous Architectures
PACT '09 Proceedings of the 2009 18th International Conference on Parallel Architectures and Compilation Techniques
Journal of Computer and System Sciences
Hi-index | 0.00 |
In this paper we describe an approach to dynamically improve the progress of streaming applications on SMP multi-core systems. We show that run-time task duplication is an effective method for maximizing application throughput in face of changes in available computing resources. Such changes can not be fully handled by static optimizations. We derive a theoretical performance model to identify tasks in need of more computing resources. We propose two on-line algorithms that use indications from the performance model to detect computation bottlenecks. In these algorithms, a task can identify itself as a bottleneck using only its local data. The proposed technique is transparent to end programmers and portable to systems with fair scheduling. Our on-line detection algorithms can be applied to other dynamic scenarios, for example, involving run-time variation of workload. Our experiments using the StreamIt benchmarks [5] show that the proposed run-time task duplication achieves considerable speedups over the multi-threaded baseline on a 16-core machine and on the scenarios with dynamically changing number of processing cores. We also show that our algorithms achieve better application throughput than alternative approaches for task duplication.