The decomposition of a square into rectangles of minimal perimeter
Discrete Applied Mathematics
Compiler Optimizations for Enhancing Parallelism and Their Impact on Architecture Design
IEEE Transactions on Computers - Special issue on architectural support for programming languages and operating systems
The decomposition of a rectangle into rectangles of minimal perimeter
SIAM Journal on Computing
Improved bounds for rectangular and guillotine partitions
Journal of Symbolic Computation
Introduction to parallel computing: design and analysis of algorithms
Introduction to parallel computing: design and analysis of algorithms
IBM Journal of Research and Development
Array decompositions for nonuniform computational environments
Journal of Parallel and Distributed Computing
The Legion vision of a worldwide virtual computer
Communications of the ACM
Tiling a rectangle with the fewest squares
Journal of Combinatorial Theory Series A
Customized dynamic load balancing for a network of workstations
Journal of Parallel and Distributed Computing
ScaLAPACK user's guide
The grid
On approximating rectangle tiling and packing
Proceedings of the ninth annual ACM-SIAM symposium on Discrete algorithms
MPI: The Complete Reference
Complexity and Approximation: Combinatorial Optimization Problems and Their Approximability Properties
Computers and Intractability; A Guide to the Theory of NP-Completeness
Computers and Intractability; A Guide to the Theory of NP-Completeness
HPCN Europe '99 Proceedings of the 7th International Conference on High-Performance Computing and Networking
Matrix-Matrix Multiplication on Heterogeneous Platforms
ICPP '00 Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing
Adaptive parallel computing on heterogeneous networks with mpC
Parallel Computing
Architectures for an Efficient Application Execution in a Collection of HNOWS
Proceedings of the 9th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Fast solution of large N × N matrix equations in an MIMD-SIMD hybrid system
Parallel Computing - Special issue: Parallel and distributed scientific and engineering computing
Mapping and Load-Balancing Iterative Computations
IEEE Transactions on Parallel and Distributed Systems
The Journal of Supercomputing
On performance analysis of heterogeneous parallel algorithms
Parallel Computing
An approximation algorithm for dissecting a rectangle into rectangles with specified areas
Discrete Applied Mathematics
Data Partitioning with a Functional Performance Model of Heterogeneous Processors
International Journal of High Performance Computing Applications
Memetic algorithms for parallel code optimization
International Journal of Parallel Programming
Speedup and scalability analysis of Master--Slave applications on large heterogeneous clusters
Journal of Parallel and Distributed Computing
Matrix product on heterogeneous master-worker platforms
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Algorithm-system scalability of heterogeneous computing
Journal of Parallel and Distributed Computing
Adaptive approaches for efficient parallel algorithms on cluster-based systems
International Journal of Grid and Utility Computing
How to Balance the Load on Heterogeneous Clusters
International Journal of High Performance Computing Applications
Proceedings of the 1st ACM workshop on Data grids for eScience
Defining and controlling the heterogeneity of a cluster: The Wrekavoc tool
Journal of Systems and Software
Euro-Par'09 Proceedings of the 2009 international conference on Parallel processing
Euro-Par'09 Proceedings of the 2009 international conference on Parallel processing
Parallel morphological processing of hyperspectral image data on heterogeneous networks of computers
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Wrekavoc: a tool for emulating heterogeneity
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Journal of Parallel and Distributed Computing
HiPC'06 Proceedings of the 13th international conference on High Performance Computing
Block matrix multiplication in a distributed computing environment: experiments with netsolve
PPAM'05 Proceedings of the 6th international conference on Parallel Processing and Applied Mathematics
Performance analysis of overheads for matrix – vector multiplication in cluster environment
PCI'05 Proceedings of the 10th Panhellenic conference on Advances in Informatics
Efficient execution of scientific computation on geographically distributed clusters
PARA'04 Proceedings of the 7th international conference on Applied Parallel Computing: state of the Art in Scientific Computing
Euro-Par'11 Proceedings of the 2011 international conference on Parallel Processing
Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
Hi-index | 0.00 |
In this paper, we address the issue of implementing matrix multiplication on heterogeneous platforms. We target two different classes of heterogeneous computing resources: heterogeneous networks of workstations and collections of heterogeneous clusters. Intuitively, the problem is to load balance the work with different speed resources while minimizing the communication volume. We formally state this problem in a geometric framework and prove its NP-completeness. Next, we introduce a (polynomial) column-based heuristic, which turns out to be very satisfactory: We derive a theoretical performance guarantee for the heuristic and we assess its practical usefulness through MPI experiments.