Toward optimizing latency under throughput constraints for application workflows on clusters

Authors:
Nagavijayalakshmi Vydyanathan;Umit V. Catalyurek;Tahsin M. Kurc;Ponnuswamy Sadayappan;Joel H. Saltz
Affiliations:
Dept. of Computer Science and Engineering, The Ohio State University;Dept. of Biomedical Informatics, The Ohio State University;Dept. of Biomedical Informatics, The Ohio State University;Dept. of Computer Science and Engineering, The Ohio State University;Dept. of Biomedical Informatics, The Ohio State University
Venue:
Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing
Year:
2007

Citing 9
Cited 6

Scheduling pipelined communication in distributed memory multiprocessors for real-time applications

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Optimal latency-throughput tradeoffs for data parallel pipelines

Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures
Precedence-Constrained Task Allocation onto Point-to-Point Networks for Pipelined Execution

IEEE Transactions on Parallel and Distributed Systems
Static scheduling algorithms for allocating directed task graphs to multiprocessors

ACM Computing Surveys (CSUR)
Computers and Intractability; A Guide to the Theory of NP-Completeness

Computers and Intractability; A Guide to the Theory of NP-Completeness
A Pipeline-Based Approach for Scheduling Video Processing Algorithms on NOW

IEEE Transactions on Parallel and Distributed Systems
Executing multiple pipelined data analysis operations in the grid

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Benchmark-problem instances for static scheduling of task graphs with communication delays on homogeneous multiprocessor systems

Computers and Operations Research
Large image correction and warping in a cluster environment

Proceedings of the 2006 ACM/IEEE conference on Supercomputing

Computing the throughput of probabilistic and replicated streaming applications

Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
Mapping workflow applications with types on heterogeneous specialized platforms

Parallel Computing
Models and complexity results for performance and energy optimization of concurrent streaming applications

International Journal of High Performance Computing Applications
Optimizing latency and throughput of application workflows on clusters

Parallel Computing
A survey of pipelined workflow scheduling: Models and algorithms

ACM Computing Surveys (CSUR)
Multi-objective exploitation of pipeline parallelism using clustering, replication and duplication in embedded multi-core systems

Journal of Systems Architecture: the EUROMICRO Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

In many application domains, it is desirable to meet some user-defined performance requirement while minimizing resource usage and optimizing additional performance parameters. For example, application workflows with real-time constraints may have strict throughput requirements and desire a low latency or response-time. The structure of these workflows can be represented as directed acyclic graphs of coarse-grained application tasks with data dependences. In this paper, we develop a novel mapping and scheduling algorithm that minimizes the latency of workflows that act on a stream of input data, while satisfying throughput requirements. The algorithm employs pipelined parallelism and intelligent clustering and replication of tasks to meet throughput requirements. Latency is minimized by exploiting task parallelism and reducing communication overheads. Evaluation using synthetic benchmarks and application task graphs shows that our algorithm 1) consistently meets throughput requirements even when other existing schemes fail, 2) produces lower-latency schedules, and 3) results in lesser resource usage.