A Duplication Based Algorithm for Optimizing Latency Under Throughput Constraints for Streaming Workflows

Authors:
Nagavijayalakshmi Vydyanathan;Umit Catalyurek;Tahsin Kurc;Ponnuswamy Sadayappan;Joel Saltz
Affiliations:
-;-;-;-;-
Venue:
ICPP '08 Proceedings of the 2008 37th International Conference on Parallel Processing
Year:
2008

Citing 0
Cited 4

Computing the throughput of probabilistic and replicated streaming applications

Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
Models and complexity results for performance and energy optimization of concurrent streaming applications

International Journal of High Performance Computing Applications
Optimizing latency and throughput of application workflows on clusters

Parallel Computing
Multi-objective exploitation of pipeline parallelism using clustering, replication and duplication in embedded multi-core systems

Journal of Systems Architecture: the EUROMICRO Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

Scheduling, in many application domains, involves the optimization of multiple performance metrics. For example, application workflows with real-time constraints have strict throughput requirements and also desire a low latency or response time. In this paper, we present a novel algorithm for the scheduling of workflows that act on a stream of input data. Our algorithm focuses on the two performance metrics: latency and throughput, and minimizes the latency of workflows while satisfying strict throughput requirements. We leverage pipelined, task and data parallelism in a coordinated manner to meet these objectives and investigate the benefit of task duplication in alleviating communication overheads in the pipelined schedule for different workflow characteristics. The proposed algorithm is designed for a realistic k-port communication model, where each processor can simultaneously communicate with at most k distinct processors. Evaluation using synthetic and application benchmarks shows that our algorithm consistently produces lower-latency schedules and meets throughput requirements, even when previously proposed schemes fail.