Seamless Integration of Parallelism and Memory Hierarchy

Authors:
Carlo Fantozzi;Andrea Pietracaprina;Geppino Pucci
Affiliations:
-;-;-
Venue:
ICALP '02 Proceedings of the 29th International Colloquium on Automata, Languages and Programming
Year:
2002

Citing 11
Cited 1

A model for hierarchical memory

STOC '87 Proceedings of the nineteenth annual ACM symposium on Theory of computing
A bridging model for parallel computation

Communications of the ACM
Efficient external memory algorithms by simulating coarse-grained parallel algorithms

Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
The Parallel Evaluation of General Arithmetic Expressions

Journal of the ACM (JACM)
On the Effectiveness of D-BSP as a Bridging Model of Parallel Computation

ICCS '01 Proceedings of the International Conference on Computational Science-Part II
Reducing I/O Complexity by Simulating Coarse Grained Parallel Algorithms

IPPS '99/SPDP '99 Proceedings of the 13th International Symposium on Parallel Processing and the 10th Symposium on Parallel and Distributed Processing
Implementing Shared Memory on Clustered Machines

IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
A Characterization of Temporal Locality and Its Portability across Memory Hierarchies

ICALP '01 Proceedings of the 28th International Colloquium on Automata, Languages and Programming,
Submachine Locality in the Bulk Synchronous Setting (Extended Abstract)

Euro-Par '96 Proceedings of the Second International Euro-Par Conference on Parallel Processing-Volume II
BSP-Like External-Memory Computation

CIAC '97 Proceedings of the Third Italian Conference on Algorithms and Complexity
A Quantitative Measure of Portability with Application to Bandwidth-Latency Models for Parallel Computing

Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing

The potential of on-chip multiprocessing for QCD machines

HiPC'05 Proceedings of the 12th international conference on High Performance Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

We prove an analogue of Brent's lemma for BSP-like parallel machines featuring a hierarchical structure for both the interconnection and the memory. Specifically, for these machines we present a uniform scheme to simulate any computation designed for v processors on a v驴-processor configuration with v驴驴 v and the same overall memory size. For a wide class of computations the simulation exhibits optimal O (v/v驴) slowdown. The simulation strategy aims at translating communication locality into temporal locality. As an important special case (v驴 = 1), our simulation can be employed to obtain efficient hierarchy-conscious sequential algorithms from efficient fine-grained ones.