Optimal mapping of sequences of data parallel tasks
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Optimal latency-throughput tradeoffs for data parallel pipelines
Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures
Precedence-Constrained Task Allocation onto Point-to-Point Networks for Pipelined Execution
IEEE Transactions on Parallel and Distributed Systems
Static scheduling algorithms for allocating directed task graphs to multiprocessors
ACM Computing Surveys (CSUR)
Optimal rate-based scheduling on multiprocessors
STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
Computers and Intractability: A Guide to the Theory of NP-Completeness
Computers and Intractability: A Guide to the Theory of NP-Completeness
Optimal Processor Assignment for a Class of Pipelined Computations
IEEE Transactions on Parallel and Distributed Systems
A Pipeline-Based Approach for Scheduling Video Processing Algorithms on NOW
IEEE Transactions on Parallel and Distributed Systems
Fair scheduling of dynamic task systems on multiprocessors
Journal of Systems and Software - Special issue: Parallel and distributed real-time systems
Performance Implications of Cache Affinity on Multicore Processors
Euro-Par '08 Proceedings of the 14th international Euro-Par conference on Parallel Processing
A survey of pipelined workflow scheduling: Models and algorithms
ACM Computing Surveys (CSUR)
Hi-index | 0.00 |
We consider approaches that allow task migration for scheduling recurrent directed-acyclic-graph (DAG) tasks on symmetric, shared-memory multiprocessors (SMPs) in order to meet a given throughput requirement with fewer processors. Within the scheduling approach proposed, we present a heuristic based on grouping DAG subtasks for lowering the end-to-end latency and an algorithm for computing an upper bound on latency. Unlike prior work, the purpose of the grouping here is not to map the subtask groups to physical processors, but to generate aggregated entities, each of which can be treated as a single schedulable unit to lower latency. Evaluation using synthetic task sets shows that our approach can lower processor needs considerably while incurring only a modest increase in latency. In contrast to the work presented herein, most prior work on scheduling recurrent DAGs has been for distributed-memory multiprocessors, and has therefore mostly been concerned with statically mapping DAG subtasks to processors.