A bridging model for parallel computation
Communications of the ACM
Can parallel algorithms enhance serial implementation?
Communications of the ACM
Truly efficient parallel algorithms: 1-optimal multisearch for an extension of the BSP model
ESA '95 Selected papers from the third European symposium on Algorithms
External-memory graph algorithms
Proceedings of the sixth annual ACM-SIAM symposium on Discrete algorithms
The Parallel Evaluation of General Arithmetic Expressions
Journal of the ACM (JACM)
Optimal organizations for pipelined hierarchical memories
Proceedings of the fourteenth annual ACM symposium on Parallel algorithms and architectures
Towards a theory of cache-efficient algorithms
Journal of the ACM (JACM)
On the Effectiveness of D-BSP as a Bridging Model of Parallel Computation
ICCS '01 Proceedings of the International Conference on Computational Science-Part II
A Characterization of Temporal Locality and Its Portability across Memory Hierarchies
ICALP '01 Proceedings of the 28th International Colloquium on Automata, Languages and Programming,
Submachine Locality in the Bulk Synchronous Setting (Extended Abstract)
Euro-Par '96 Proceedings of the Second International Euro-Par Conference on Parallel Processing-Volume II
BSP-Like External-Memory Computation
CIAC '97 Proceedings of the Third Italian Conference on Algorithms and Complexity
Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
Cache-oblivious simulation of parallel programs
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Hi-index | 0.01 |
In this work, we show that the submachine locality exposed by hierarchical bulk-synchronous computations can be efficiently turned into locality of reference on arbitrarily deep hierarchies. Specifically, we develop efficient schemes to simulate parallel programs written for the Decomposable BSP (a BSP variant which features a hierarchical decomposition into submachines) on the sequential Hierarchical Memory Model (HMM), which rewards the exploitation of temporal locality, and on its extension with block transfer, the BT model, which also rewards the exploitation of spatial locality. The simulations yield good hierarchy-conscious sequential algorithms from parallel ones, and provide evidence of the strict relation between submachine locality in parallel computation and locality of reference in the hierarchical memory setting. We also devise a generalization of the HMM result to the self-simulation of D-BSP augmented with hierarchical memory modules, which yields an interesting analog of Brent's lemma, thus proving that the enhanced model features a seamless integration of memory and network hierarchies.