Efficient automatic simulation of parallel computation on networks of workstations

Authors:
Christos Kaklamanis;Danny Krizanc;Manuela Montangero;Giuseppe Persiano
Affiliations:
Computer Technology Institute and Department of Computer Engineering and Informatics, University of Patras, GR26500 Rion, Greece;Department of Mathematics and Computer Science, Wesleyan University, Middletown CT 06459, USA;Dipartimento di Ingegneria dell'Informazione, Universití di Modena e Reggio Emilia, Via Vignolese 905/b, 41100 Modena, Italy;Dipartimento di Informatica ed Applicazioni, Universití di Salerno, 84081 Baronissi (Salerno), Italy
Venue:
Discrete Applied Mathematics
Year:
2006

Citing 11
Cited 0

Efficient dispersal of information for security, load balancing, and fault tolerance

Journal of the ACM (JACM)
Work-preserving emulations of fixed-connection networks

STOC '89 Proceedings of the twenty-first annual ACM symposium on Theory of computing
A bridging model for parallel computation

Communications of the ACM
General purpose parallel architectures

Handbook of theoretical computer science (vol. A)
Computing with faulty arrays

STOC '92 Proceedings of the twenty-fourth annual ACM symposium on Theory of computing
Multi-scale self-simulation: a technique for reconfiguring arrays with faults

STOC '93 Proceedings of the twenty-fifth annual ACM symposium on Theory of computing
Improved methods for hiding latency in high bandwidth networks (extended abstract)

Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures
Automatic methods for hiding latency in high bandwidth networks (extended abstract)

STOC '96 Proceedings of the twenty-eighth annual ACM symposium on Theory of computing
Asymptotically tight bounds for computing with faulty arrays of processors

SFCS '90 Proceedings of the 31st Annual Symposium on Foundations of Computer Science
On the fault tolerance of some popular bounded-degree networks

SFCS '92 Proceedings of the 33rd Annual Symposium on Foundations of Computer Science
Efficient out-of-core algorithms for linear relaxation using blocking covers

SFCS '93 Proceedings of the 1993 IEEE 34th Annual Foundations of Computer Science

Quantified Score

Hi-index	0.04

Visualization

Abstract

Andrews et al. [Automatic method for hiding latency in high bandwidth networks, in: Proceedings of the ACM Symposium on Theory of Computing, 1996, pp. 257-265; Improved methods for hiding latency in high bandwidth networks, in: Proceedings of the Eighth Annual ACM Symposium on Parallel Algorithms and Architectures, 1996, pp. 52-61] introduced a number of techniques for automatically hiding latency when performing simulations of networks with unit delay links on networks with arbitrary unequal delay links. In their work, they assume that processors of the host network are identical in computational power to those of the guest network being simulated. They further assume that the links of the host are able to pipeline messages, i.e., they are able to deliver P packets in time O(P+d) where d is the delay on the link. In this paper we examine the effect of eliminating one or both of these assumptions. In particular, we provide an efficient simulation of a linear array of homogeneous processors connected by unit-delay links on a linear array of heterogeneous processors connected by links with arbitrary delay. We show that the slowdown achieved by our simulation is optimal. We then consider the case of simulating cliques by cliques; i.e., a clique of heterogeneous processors with arbitrary delay links is used to simulate a clique of homogeneous processors with unit delay links. We reduce the slowdown from the obvious bound of the maximum delay link to the average of the link delays. In the case of the linear array we consider both links with and without pipelining. For the clique simulation the links are not assumed to support pipelining. The main motivation of our results (as was the case with Andrews et al.) is to mitigate the degradation of performance when executing parallel programs designed for different architectures on a network of workstations (NOW). In such a setting it is unlikely that the links provided by the NOW will support pipelining and it is quite probable the processors will be heterogeneous. Combining our result on clique simulation with well-known techniques for simulating shared memory PRAMs on distributed memory machines provides an effective automatic compilation of a PRAM algorithm on a NOW.