A set of level 3 basic linear algebra subprograms
ACM Transactions on Mathematical Software (TOMS)
Introduction to parallel computing: design and analysis of algorithms
Introduction to parallel computing: design and analysis of algorithms
ScaLAPACK user's guide
Memory Hierarchy Considerations for Cost-Effective Cluster Computing
IEEE Transactions on Computers
A Proposal for a Heterogeneous Cluster ScaLAPACK (Dense Linear Solvers)
IEEE Transactions on Computers
HPCN Europe '99 Proceedings of the 7th International Conference on High-Performance Computing and Networking
IRREGULAR '98 Proceedings of the 5th International Symposium on Solving Irregularly Structured Problems in Parallel
Linear Algebra Algorithms in Heterogeneous Cluster of Personal Computers
HCW '00 Proceedings of the 9th Heterogeneous Computing Workshop
Simulation of Data Distribution Strategies for LU Factorization on Heterogeneous Machines
IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
Memory latency consideration for load sharing on heterogeneous network of workstations
Journal of Systems Architecture: the EUROMICRO Journal
Building the functional performance model of a processor
Proceedings of the 2006 ACM symposium on Applied computing
Data Partitioning with a Functional Performance Model of Heterogeneous Processors
International Journal of High Performance Computing Applications
Data partitioning for multiprocessors with memory heterogeneity and memory constraints
Scientific Programming - International Symposium of Parallel and Distributed Computing & International Workshop on Algorithms, Models and Tools for Parallel Computing on Heterogenous Networks
Generic database cost models for hierarchical memory systems
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Design and Implementation of the ScaLAPACK LU, QR, and Cholesky Factorization Routines
Scientific Programming
Paper: The benchmark of the EuroBen group
Parallel Computing
Out-of-core divisible load processing
IEEE Transactions on Parallel and Distributed Systems
Euro-Par'09 Proceedings of the 2009 international conference on Parallel processing
Euro-Par'09 Proceedings of the 2009 international conference on Parallel processing
Enabling and scaling matrix computations on heterogeneous multi-core and multi-GPU systems
Proceedings of the 26th ACM international conference on Supercomputing
A scalable framework for heterogeneous GPU-based clusters
Proceedings of the twenty-fourth annual ACM symposium on Parallelism in algorithms and architectures
Hi-index | 0.00 |
In this paper, we study the problem of optimal matrix partitioning for parallel dense factorization on heterogeneous processors. First, we outline existing algorithms solving the problem that use a constant performance model of processors, when the relative speed of each processor is represented by a positive constant. We also propose a new efficient algorithm, called the Reverse algorithm, solving the problem with the constant performance model. We extend the presented algorithms to the functional performance model, representing the speed of a processor by a continuous function of the task size. The model, in particular, takes account of memory heterogeneity and paging effects resulting in significant variations of relative speeds of the processors with the increase of the task size. We experimentally demonstrate that the functional extension of the Reverse algorithm outperforms functional extensions of traditional algorithms.