Introduction to parallel algorithms and architectures: array, trees, hypercubes
Introduction to parallel algorithms and architectures: array, trees, hypercubes
The Asynchronous Transfer Mode: a tutorial
Computer Networks and ISDN Systems - Special issue on the ATM—asynchronous transfer mode
The network architecture of the Connection Machine CM-5 (extended abstract)
SPAA '92 Proceedings of the fourth annual ACM symposium on Parallel algorithms and architectures
LogP: towards a realistic model of parallel computation
PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Introduction to parallel computing: design and analysis of algorithms
Introduction to parallel computing: design and analysis of algorithms
IBM Systems Journal
Multicast virtual topologies for collective communication in MPCs and ATM clusters
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Parallel implementations of the power system transient stability problem on clusters of workstations
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Enhanced PVM Communications over a High-Speed LAN
IEEE Parallel & Distributed Technology: Systems & Technology
The Scalability of FFT on Parallel Computers
IEEE Transactions on Parallel and Distributed Systems
Practical Parallel Algorithms for Dynamic Data Redistribution, Median Finding, and Selection
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Loop scheduling for heterogeneity
HPDC '95 Proceedings of the 4th IEEE International Symposium on High Performance Distributed Computing
Distributed synthesis of real-time computer systems
RTAS '95 Proceedings of the Real-Time Technology and Applications Symposium
Bitonic sort on a chained-cubic tree interconnection network
Journal of Parallel and Distributed Computing
Hi-index | 0.00 |
We consider parallel computing on a network of workstations using a connection-oriented protocol (e.g., Asynchronous Transfer Mode) for data communication. In a connection-oriented protocol, a virtual circuit of guaranteed bandwidth is established for each pair of communicating workstations. Since all virtual circuits do not have the same guaranteed bandwidth, a parallel application must deal with the unequal bandwidths between workstations. Since most works in the design of parallel algorithms assume equal bandwidths on all the communication links, they often do not perform well when executed on networks of workstations using connection-oriented protocols. In this paper, we first evaluate the performance degradation caused by unequal bandwidths on the execution of conventional parallel algorithms such as the fast Fourier transform and bitonic sort. We then present a strategy based on dynamic redistribution of data points to reduce the bottlenecks caused by unequal bandwidths. We also extend this strategy to deal with processor heterogeneity. Using analysis and simulation we show that there is a considerable reduction in the runtime if the proposed redistribution strategy is adopted. The basic idea presented in this paper can also be used to improve the runtimes of other parallel applications in connection-oriented environments.