Multi-CMP system with data communication on the fly
The Journal of Supercomputing
Dynamic SMP clusters in soc technology – towards massively parallel fine grain numerics
PPAM'05 Proceedings of the 6th international conference on Parallel Processing and Applied Mathematics
Scheduling moldable tasks for dynamic SMP clusters in soc technology
PPAM'05 Proceedings of the 6th international conference on Parallel Processing and Applied Mathematics
Scheduling architecture---supported regions in parallel programs
PARA'10 Proceedings of the 10th international conference on Applied Parallel and Scientific Computing - Volume Part I
Data transfers on the fly for hierarchical systems of chip multi-processors
PPAM'11 Proceedings of the 9th international conference on Parallel Processing and Applied Mathematics - Volume Part I
Scheduling parallel programs based on architecture: supported regions
PPAM'11 Proceedings of the 9th international conference on Parallel Processing and Applied Mathematics - Volume Part II
Parallel matrix multiplication based on dynamic SMP clusters in SoC technology
ISPA'07 Proceedings of the 2007 international conference on Frontiers of High Performance Computing and Networking
Hi-index | 0.00 |
The paper presents a new architecture for systems based on run-time reconfigured shared memory processor clusters meant for implementation using network on chip technology. Clusters constitute local data exchange sub-networks, which dynamically connect processors with shared memory modules. The sub-networks enable exposure of data from one processor's data cache for reading by other processors to their data caches. This inter-processor data exchange paradigm, called "communication on the fly", enables direct communication between processor data caches. Dual-ported data caches are assumed to enable parallel reading and writing data between the caches and memory modules. In the proposed architecture, programs are executed according to a cache-controlled macro data flow execution model. Computational tasks are so defined, as to eliminate re-loading of data caches during task execution. A special program macro-data flow graph representation enables modeling of program behaviour for different architectural and program structure assumptions. Simulation results of symbolic execution of program graphs of matrix multiplication are presented in the paper. They show suitability of the proposed architecture for very fine grain parallel computations.