Benchmarking and comparison of the task graph scheduling algorithms
Journal of Parallel and Distributed Computing
Optimal use of mixed task and data parallelism for pipelined computations
Journal of Parallel and Distributed Computing
Performance-Effective and Low-Complexity Task Scheduling for Heterogeneous Computing
IEEE Transactions on Parallel and Distributed Systems
Video compression with parallel processing
Parallel Computing - Parallel computing in image and video processing
Scheduling of Periodic Time Critical Applications for Pipelined Execution on Heterogeneous Systems
ICPP '02 Proceedings of the 2001 International Conference on Parallel Processing
The design of an acquisitional query processor for sensor networks
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Packet Size Optimization for Supporting Coarse-Grained Pipelined Parallelism
ICPP '04 Proceedings of the 2004 International Conference on Parallel Processing
Exploiting coarse-grained task, data, and pipeline parallelism in stream programs
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Large image correction and warping in a cluster environment
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
A New Task Graph Model for Mapping Message Passing Applications
IEEE Transactions on Parallel and Distributed Systems
Mapping pipeline skeletons onto heterogeneous platforms
Journal of Parallel and Distributed Computing
Complexity results for throughput and latency optimization of replicated and data-parallel workflows
CLUSTER '07 Proceedings of the 2007 IEEE International Conference on Cluster Computing
Elastic scaling of data parallel operators in stream processing
IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
Optimizing latency and throughput of application workflows on clusters
Parallel Computing
Exploiting throughput for pipeline execution in streaming image processing applications
Euro-Par'06 Proceedings of the 12th international conference on Parallel Processing
A programming model for an embedded media processing architecture
SAMOS'05 Proceedings of the 5th international conference on Embedded Computer Systems: architectures, Modeling, and Simulation
Hi-index | 0.00 |
The exploitation of throughput in a parallel application that processes an input data stream is a difficult challenge. For typical coarse-grain applications, where the computation time of tasks is greater than their communication time, the maximum achievable throughput is determined by the maximum task computation time. Thus, the improvement in throughput above this maximum would eventually require the modification of the source code of the tasks. In this work, we address the improvement of throughput by proposing two task replication methodologies that have the target throughput to be achieved as an input parameter. They proceed by generating a new task graph structure that permits the target throughput to be achieved. The first replication mechanism, named DPRM (Data Parallel Replication Mechanism), exploits the inner task data parallelism. The second mechanism, named TCRM (Task Copy Replication Mechanism), creates new execution paths inside the application task graph structure that allows more than one instance of data to be processed concurrently. We evaluate the effectiveness of these mechanisms with three real applications executed in a cluster system: the MPEG2 video compressor, the IVUS (Intra-Vascular Ultra-Sound) medical image application and the BASIZ (Bright and SAtured Images Zone) video processing application. In all these cases, the obtained throughput was greater after applying the proposed replication mechanism than what the application could provide with the original implementation.