Towards an architecture-independent analysis of parallel algorithms
STOC '88 Proceedings of the twentieth annual ACM symposium on Theory of computing
Analysis of task migration in shared-memory multiprocessor scheduling
SIGMETRICS '91 Proceedings of the 1991 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Automatically partitioning threads for multithreaded architectures
Journal of Parallel and Distributed Computing - Special issue on compilation and architectural support for parallel applications
Static scheduling algorithms for allocating directed task graphs to multiprocessors
ACM Computing Surveys (CSUR)
Eager scheduling with lazy retry in multiprocessors
Future Generation Computer Systems
Lock-free scheduling of logical processes in parallel simulation
Proceedings of the fifteenth workshop on Parallel and distributed simulation
Computers and Intractability; A Guide to the Theory of NP-Completeness
Computers and Intractability; A Guide to the Theory of NP-Completeness
Analysis, evaluation, and comparison of algorithms for scheduling task graphs on parallel processors
ISPAN '96 Proceedings of the 1996 International Symposium on Parallel Architectures, Algorithms and Networks
Cilk: An Efficient Multithreaded Runtime System
Cilk: An Efficient Multithreaded Runtime System
Designing irregular parallel algorithms with mutual exclusion and lock-free protocols
Journal of Parallel and Distributed Computing
Variational probabilistic inference and the QMR-DT network
Journal of Artificial Intelligence Research
Scheduling dense linear algebra operations on multicore processors
Concurrency and Computation: Practice & Experience
Scalable Node-Level Computation Kernels for Parallel Exact Inference
IEEE Transactions on Computers
Centralized versus distributed schedulers for multiple bag-of-task applications
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Scheduling multiple DAGs onto heterogeneous systems
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Lock-free and practical doubly linked list-based deques using single-word compare-and-swap
OPODIS'04 Proceedings of the 8th international conference on Principles of Distributed Systems
Journal of Parallel and Distributed Computing
Hi-index | 0.00 |
Many computational solutions can be expressed as directed acyclic graphs (DAGs), in which the nodes represent tasks to be executed. A fundamental challenge in parallel computing is to schedule such DAGs onto multicore processors while preserving the precedence constraints. In this paper, we propose a lightweight scheduling method for DAG structured computations on multicore processors. We distribute the scheduling activities across the cores and let the schedulers collaborate with each other to balance the workload. In addition, we develop a software lock-free local task list for the scheduler to reduce the scheduling overhead. We experimentally evaluated the proposed method by comparing with various baseline methods on state-of-the-art multicore processors. For a representative set of DAG structured computations from both synthetic and real problems, the proposed scheduler with lock-free local task lists achieved 15.12x average speedup on a platform with four quadcore processors, compared to 8.77x achieved by lock-based baseline methods. The observed scheduling overhead of the proposed scheduler was less than 1% of the overall execution time.