A model for hierarchical memory
STOC '87 Proceedings of the nineteenth annual ACM symposium on Theory of computing
A bridging model for parallel computation
Communications of the ACM
Efficient external memory algorithms by simulating coarse-grained parallel algorithms
Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
The Parallel Evaluation of General Arithmetic Expressions
Journal of the ACM (JACM)
On the Effectiveness of D-BSP as a Bridging Model of Parallel Computation
ICCS '01 Proceedings of the International Conference on Computational Science-Part II
Reducing I/O Complexity by Simulating Coarse Grained Parallel Algorithms
IPPS '99/SPDP '99 Proceedings of the 13th International Symposium on Parallel Processing and the 10th Symposium on Parallel and Distributed Processing
Implementing Shared Memory on Clustered Machines
IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
A Characterization of Temporal Locality and Its Portability across Memory Hierarchies
ICALP '01 Proceedings of the 28th International Colloquium on Automata, Languages and Programming,
Submachine Locality in the Bulk Synchronous Setting (Extended Abstract)
Euro-Par '96 Proceedings of the Second International Euro-Par Conference on Parallel Processing-Volume II
BSP-Like External-Memory Computation
CIAC '97 Proceedings of the Third Italian Conference on Algorithms and Complexity
Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
The potential of on-chip multiprocessing for QCD machines
HiPC'05 Proceedings of the 12th international conference on High Performance Computing
Hi-index | 0.00 |
We prove an analogue of Brent's lemma for BSP-like parallel machines featuring a hierarchical structure for both the interconnection and the memory. Specifically, for these machines we present a uniform scheme to simulate any computation designed for v processors on a v驴-processor configuration with v驴 驴 v and the same overall memory size. For a wide class of computations the simulation exhibits optimal O (v/v驴) slowdown. The simulation strategy aims at translating communication locality into temporal locality. As an important special case (v驴 = 1), our simulation can be employed to obtain efficient hierarchy-conscious sequential algorithms from efficient fine-grained ones.