Data distribution for dense factorization on computers with memory heterogeneity

Authors:
Alexey Lastovetsky;Ravi Reddy
Affiliations:
School of Computer Science and Informatics, UCD, Belfield, Dublin 4, Ireland;School of Computer Science and Informatics, UCD, Belfield, Dublin 4, Ireland
Venue:
Parallel Computing
Year:
2007

Citing 17
Cited 5

A set of level 3 basic linear algebra subprograms

ACM Transactions on Mathematical Software (TOMS)
Introduction to parallel computing: design and analysis of algorithms

Introduction to parallel computing: design and analysis of algorithms
ScaLAPACK user's guide

ScaLAPACK user's guide
Memory Hierarchy Considerations for Cost-Effective Cluster Computing

IEEE Transactions on Computers
A Proposal for a Heterogeneous Cluster ScaLAPACK (Dense Linear Solvers)

IEEE Transactions on Computers
Heterogeneous Distribution of Computations While Solving Linear Algebra Problems on Networks of Heterogeneous Computers

HPCN Europe '99 Proceedings of the 7th International Conference on High-Performance Computing and Networking
Experiments with mpC: Efficient Solving Regular Problems on Heterogeneous Networks of Computers via Irregulation

IRREGULAR '98 Proceedings of the 5th International Symposium on Solving Irregularly Structured Problems in Parallel
Linear Algebra Algorithms in Heterogeneous Cluster of Personal Computers

HCW '00 Proceedings of the 9th Heterogeneous Computing Workshop
Simulation of Data Distribution Strategies for LU Factorization on Heterogeneous Machines

IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
Memory latency consideration for load sharing on heterogeneous network of workstations

Journal of Systems Architecture: the EUROMICRO Journal
Building the functional performance model of a processor

Proceedings of the 2006 ACM symposium on Applied computing
Data Partitioning with a Functional Performance Model of Heterogeneous Processors

International Journal of High Performance Computing Applications
Data partitioning for multiprocessors with memory heterogeneity and memory constraints

Scientific Programming - International Symposium of Parallel and Distributed Computing & International Workshop on Algorithms, Models and Tools for Parallel Computing on Heterogenous Networks
Generic database cost models for hierarchical memory systems

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Design and Implementation of the ScaLAPACK LU, QR, and Cholesky Factorization Routines

Scientific Programming
Paper: The benchmark of the EuroBen group

Parallel Computing
Out-of-core divisible load processing

IEEE Transactions on Parallel and Distributed Systems

Improving the scalability of hyperspectral imaging applications on heterogeneous platforms using adaptive run-time data compression

Computers & Geosciences
Distributed data partitioning for heterogeneous processors based on partial estimation of their functional performance models

Euro-Par'09 Proceedings of the 2009 international conference on Parallel processing
Two-dimensional matrix partitioning for parallel computing on heterogeneous processors based on their functional performance models

Euro-Par'09 Proceedings of the 2009 international conference on Parallel processing
Enabling and scaling matrix computations on heterogeneous multi-core and multi-GPU systems

Proceedings of the 26th ACM international conference on Supercomputing
A scalable framework for heterogeneous GPU-based clusters

Proceedings of the twenty-fourth annual ACM symposium on Parallelism in algorithms and architectures

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we study the problem of optimal matrix partitioning for parallel dense factorization on heterogeneous processors. First, we outline existing algorithms solving the problem that use a constant performance model of processors, when the relative speed of each processor is represented by a positive constant. We also propose a new efficient algorithm, called the Reverse algorithm, solving the problem with the constant performance model. We extend the presented algorithms to the functional performance model, representing the speed of a processor by a continuous function of the task size. The model, in particular, takes account of memory heterogeneity and paging effects resulting in significant variations of relative speeds of the processors with the increase of the task size. We experimentally demonstrate that the functional extension of the Reverse algorithm outperforms functional extensions of traditional algorithms.