Scheduling pipelined communication in distributed memory multiprocessors for real-time applications
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Optimal latency-throughput tradeoffs for data parallel pipelines
Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures
Precedence-Constrained Task Allocation onto Point-to-Point Networks for Pipelined Execution
IEEE Transactions on Parallel and Distributed Systems
Static scheduling algorithms for allocating directed task graphs to multiprocessors
ACM Computing Surveys (CSUR)
Computers and Intractability; A Guide to the Theory of NP-Completeness
Computers and Intractability; A Guide to the Theory of NP-Completeness
A Pipeline-Based Approach for Scheduling Video Processing Algorithms on NOW
IEEE Transactions on Parallel and Distributed Systems
Executing multiple pipelined data analysis operations in the grid
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Computers and Operations Research
Large image correction and warping in a cluster environment
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Computing the throughput of probabilistic and replicated streaming applications
Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
International Journal of High Performance Computing Applications
Optimizing latency and throughput of application workflows on clusters
Parallel Computing
A survey of pipelined workflow scheduling: Models and algorithms
ACM Computing Surveys (CSUR)
Journal of Systems Architecture: the EUROMICRO Journal
Hi-index | 0.00 |
In many application domains, it is desirable to meet some user-defined performance requirement while minimizing resource usage and optimizing additional performance parameters. For example, application workflows with real-time constraints may have strict throughput requirements and desire a low latency or response-time. The structure of these workflows can be represented as directed acyclic graphs of coarse-grained application tasks with data dependences. In this paper, we develop a novel mapping and scheduling algorithm that minimizes the latency of workflows that act on a stream of input data, while satisfying throughput requirements. The algorithm employs pipelined parallelism and intelligent clustering and replication of tasks to meet throughput requirements. Latency is minimized by exploiting task parallelism and reducing communication overheads. Evaluation using synthetic benchmarks and application task graphs shows that our algorithm 1) consistently meets throughput requirements even when other existing schemes fail, 2) produces lower-latency schedules, and 3) results in lesser resource usage.