Partitioning Techniques for Large-Grained Parallelism
IEEE Transactions on Computers
Scientific computing: an introduction with parallel computing
Scientific computing: an introduction with parallel computing
Distributed computation with communication delays: asymptotic performance analysis
Journal of Parallel and Distributed Computing
High performance computing
Scheduling divisible jobs on hypercubes
Parallel Computing
Parallel matrix-vector product on rings with a minimum of communications
Parallel Computing
Journal of Parallel and Distributed Computing
Scheduling divisible loads in a three-dimensional mesh of processors
Parallel Computing
On the Influence of Start-Up Costs in Scheduling Divisible Loads on Bus Networks
IEEE Transactions on Parallel and Distributed Systems
Closed Form Solutions for Bus and Tree Networks of Processors Load Sharing a Divisible Job
IEEE Transactions on Computers
IEEE Transactions on Parallel and Distributed Systems
Optimizing Computing Costs Using Divisible Load Analysis
IEEE Transactions on Parallel and Distributed Systems
IEEE Transactions on Parallel and Distributed Systems
Transmission Rates and Performance of a Network of Computers
HPCN Europe 1994 Proceedings of the nternational Conference and Exhibition on High-Performance Computing and Networking Volume II: Networking and Tools
Experiments with Scheduling Divisible Tasks in Clusters of Workstations
Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
Sharing Partitionable Workloads in Heterogeneous NOWs: Greedier Is Not Better
CLUSTER '01 Proceedings of the 3rd IEEE International Conference on Cluster Computing
A Novel Optimal Load Distribution Algorithm for Divisible Loads
Cluster Computing
IEEE Transactions on Computers
Improved Methods for Divisible Load Distribution on k-Dimensional Meshes Using Multi-Installment
IEEE Transactions on Parallel and Distributed Systems
Journal of Parallel and Distributed Computing
HiPC'08 Proceedings of the 15th international conference on High performance computing
Requirement-aware strategies for scheduling real-time divisible loads on clusters
Journal of Parallel and Distributed Computing
Hi-index | 0.98 |
The problem of executing large BLAS (basic linear algebra subprograms) Level-2 operations, such as matrix-vector products, in a network-based distributed computing environment composed of a bus-oriented workstation cluster is considered. Unlike previous contributions, we take into account the fact that workstations, as against mainframe computers, are not equipped with communication coprocessors or front-ends, precluding any possibility of communication off-loading. Communication delays, which are significant in workstation clusters due to limited bandwidth availability, are specifically accounted for. This aspect is generally ignored in most performance analysis of parallel computing systems. The important contribution of this study is to show that the optimal load partitioning, and the subsequent performance of the network, depends critically on network bandwidth, computing capacity, and load characteristics. We design load distribution strategies for three cases (no communication, broadcast communication, and multicast communication) based on closed-form solutions of the optimal load partitioning problem and also present extensive and complete asymptotic analysis with respect to several parameters of the load and the system. Necessary and sufficient conditions for feasible and optimal load sharing are also derived. A trade-off study between the optimal number of workstations and the bandwidth of the bus is also presented.