Implementing Shared Memory on Clustered Machines
IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
A near optimal scheduler for switch-memory-switch routers
Proceedings of the fifteenth annual ACM symposium on Parallel algorithms and architectures
How asymmetry helps load balancing
Journal of the ACM (JACM)
Balanced allocation and dictionaries with tightly packed constant size bins
Theoretical Computer Science
Parallel Randomized Load Balancing: A Lower Bound for a More General Model
SOFSEM '10 Proceedings of the 36th Conference on Current Trends in Theory and Practice of Computer Science
Parallel randomized load balancing: A lower bound for a more general model
Theoretical Computer Science
Balanced allocation and dictionaries with tightly packed constant size bins
ICALP'05 Proceedings of the 32nd international conference on Automata, Languages and Programming
Revisiting randomized parallel load balancing algorithms
SIROCCO'09 Proceedings of the 16th international conference on Structural Information and Communication Complexity
Revisiting randomized parallel load balancing algorithms
Theoretical Computer Science
Hi-index | 0.00 |
In this paper we study the problem of simulating shared memory on the distributed memory machine (DMM). Our approach uses multiple copies of shared memory cells, distributed among the memory modules of the DMM via universal hashing. The main aim is to design strategies that resolve contention at the memory modules. Extending results and methods from random graphs and very fast randomized algorithms, we present new simulation techniques that enable us to improve the previously best results exponentially. In particular, we show that an $n$-processor CRCW PRAM can be simulated by an n-processor DMM with delay $\O(\log\log\log n \log^*n)$, with high probability.Next we describe a general technique that can be used to turn these simulations into time-processor optimal ones, in the case of EREW PRAMs to be simulated. We obtain a time-processor optimal simulation of an (n log log log n log*n)-processor EREW PRAM on an n-processor DMM with delay $\O(\log\log\log n \log^*n)$, with high probability. When an (n log log log n log*n)-processor CRCW PRAM is simulated, the delay is only by a log*n factor larger.We further demonstrate that the simulations presented can not be significantly improved using our techniques. We show an $\Omega(\log\log\log n / \log\log\log\log n)$ lower bound on the expected delay for a class of PRAM simulations, called topological simulations, that covers all previously known simulations as well as the simulations presented in the paper.