Data transfers on the fly for hierarchical systems of chip multi-processors
PPAM'11 Proceedings of the 9th international conference on Parallel Processing and Applied Mathematics - Volume Part I
Hi-index | 0.00 |
The paper presents a study of medium and coarse grain numerical computations in a new cluster-based shared memory parallel architecture oriented into implementation in "Systems on Chip" (SoC) technology. The assumed architecture is based on dynamic processor clusters, organized around shared memory modules. Fast shared data transfers between processors from different clusters are performed through communication on the fly, which is a synergy of processor switching between clusters and intracluster data reads on the fly. Dynamic processor clusters are implemented inside SoC modules additionally connected by a global inter-cluster network. The paper discusses speedup and parallelization efficiency of parallel matrix multiplication estimated by symbolic execution of program graphs. Simulation results are presented for algorithms with two kinds of data decomposition: recursive division of matrices into quadrants and division of matrices into stripes. In the quadrant-based method, elementary square sub-matrix multiplications are performed using the serial Strassen method and the communication on the fly is applied. The experiments reveal much higher efficiency of the proposed quadrant-based matrix multiplication method than that of the "stripe method, considered very efficient in conventional parallel shared memory systems.