Load-sharing in heterogeneous systems via weighted factoring
Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures
Scheduling multithreaded computations by work stealing
Journal of the ACM (JACM)
Matrix Multiplication on Heterogeneous Platforms
IEEE Transactions on Parallel and Distributed Systems
HPCN Europe '99 Proceedings of the 7th International Conference on High-Performance Computing and Networking
Mapping and Load-Balancing Iterative Computations
IEEE Transactions on Parallel and Distributed Systems
Data Partitioning with a Functional Performance Model of Heterogeneous Processors
International Journal of High Performance Computing Applications
Dynamic Load Balancing on Dedicated Heterogeneous Systems
Proceedings of the 15th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures
Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
EuroPar'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part I
Automatic tuning of iterative computation on heterogeneous multiprocessors with ADITHE
The Journal of Supercomputing
A Class of Hybrid LAPACK Algorithms for Multicore and GPU Architectures
SAAHPC '11 Proceedings of the 2011 Symposium on Application Accelerators in High-Performance Computing
On optimum multi-installment divisible load processing in heterogeneous distributed systems
Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing
On Realistic Divisible Load Scheduling in Highly Heterogeneous Distributed Systems
PDP '12 Proceedings of the 2012 20th Euromicro International Conference on Parallel, Distributed and Network-based Processing
Euro-Par'11 Proceedings of the 2011 international conference on Parallel Processing
Journal of Parallel and Distributed Computing
Hi-index | 0.00 |
Hierarchical level of heterogeneity exists in many modern high performance clusters in the form of heterogeneity between computing nodes, and within a node with the addition of specialized accelerators, such as GPUs. To achieve high performance of scientific applications on these platforms it is necessary to perform load balancing. In this paper we present a hierarchical matrix partitioning algorithm based on realistic performance models at each level of hierarchy. To minimise the total execution time of the application it iteratively partitions a matrix between nodes and partitions these sub-matrices between the devices in a node. This is a self-adaptive algorithm that dynamically builds the performance models at run-time and it employs an algorithm to minimise the total volume of communication. This algorithm allows scientific applications to perform load balanced matrix operations with nested parallelism on hierarchical heterogeneous platforms. To show the effectiveness of the algorithm we applied it to a fundamental operation in scientific parallel computing, matrix multiplication. Large scale experiments on a heterogeneous multi-cluster site incorporating multicore CPUs and GPU nodes show that the presented algorithm outperforms current state of the art approaches and successfully load balance very large problems.