Towards an architecture-independent analysis of parallel algorithms
SIAM Journal on Computing
A practical algorithm for exact array dependence analysis
Communications of the ACM
PYRROS: static task scheduling and code generation for message passing multiprocessors
ICS '92 Proceedings of the 6th international conference on Supercomputing
Global optimizations for parallelism and locality on scalable parallel machines
PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Task scheduling in parallel and distributed systems
Task scheduling in parallel and distributed systems
List scheduling with and without communication delays
Parallel Computing
Task Clustering and Scheduling for Distributed Memory Parallel Architectures
IEEE Transactions on Parallel and Distributed Systems
IEEE Transactions on Parallel and Distributed Systems
Affine dependence classificatinon for communications minimization
International Journal of Parallel Programming
Poems: end-to-end performance design of large parallel adaptive computational systems
Proceedings of the 1st international workshop on Software and performance
Compact DAG representation and its dynamic scheduling
Journal of Parallel and Distributed Computing
Static scheduling algorithms for allocating directed task graphs to multiprocessors
ACM Computing Surveys (CSUR)
Sparse LU factorization with partial pivoting on distributed memory machines
Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Partitioning and Scheduling Parallel Programs for Multiprocessors
Partitioning and Scheduling Parallel Programs for Multiprocessors
Low-Cost Task Scheduling for Distributed-Memory Machines
IEEE Transactions on Parallel and Distributed Systems
Computers and Intractability: A Guide to the Theory of NP-Completeness
Computers and Intractability: A Guide to the Theory of NP-Completeness
On the Granularity and Clustering of Directed Acyclic Task Graphs
IEEE Transactions on Parallel and Distributed Systems
DSC: Scheduling Parallel Tasks on an Unbounded Number of Processors
IEEE Transactions on Parallel and Distributed Systems
Mapping affine loop nests: new results
HPCN Europe '95 Proceedings of the International Conference and Exhibition on High-Performance Computing and Networking
Clustering and reassignment-based mapping strategy for message-passing architectures
Journal of Systems Architecture: the EUROMICRO Journal
A New Heuristic for Scheduling Parallel Programs on Multiprocessor
PACT '98 Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques
Symbolic Partitioning and Scheduling of Parameterized Task Graphs
ICPADS '98 Proceedings of the 1998 International Conference on Parallel and Distributed Systems
SLC: Symbolic Scheduling for Executing Parameterized Task Graphs on Multiprocessors
ICPP '99 Proceedings of the 1999 International Conference on Parallel Processing
Runtime Parallel Incremental Scheduling of DAGs
ICPP '00 Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing
Low Memory Cost Dynamic Scheduling of Large Coarse Grain Task Graphs
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
DAGuE: A generic distributed DAG engine for High Performance Computing
Parallel Computing
Enabling large-scale scientific workflows on petascale resources using MPI master/worker
Proceedings of the 1st Conference of the Extreme Science and Engineering Discovery Environment: Bridging from the eXtreme to the campus and beyond
International Journal of High Performance Computing Applications
Hi-index | 0.00 |
Task graph scheduling has been found effective in performance prediction and optimization of parallel applications. A number of static scheduling algorithms have been proposed for task graph execution on parallel machines. Such an approach cannot be adapted to changes in values of program parameters and the number of processors and it also cannot handle large task graphs. In this paper, we model parallel computation using parameterized task graphs which represent coarse-grain parallelism independent of the problem size. We present a symbolic scheduling algorithm for a parameterized task graph which first derives linear clusters and then assigns task clusters to processors. The runtime system executes clusters on each processor in a multi-threaded fashion. The experiments using various scientific computing kernel benchmarks show that our method delivers compact and symbolic schedules with performance highly competitive to static approaches.