Dynamic SMP clusters in soc technology – towards massively parallel fine grain numerics

Authors:
Marek Tudruj;Lukasz Masko
Affiliations:
Institute of Computer Science of the Polish Academy of Sciences, Warsaw, Poland;Institute of Computer Science of the Polish Academy of Sciences, Warsaw, Poland
Venue:
PPAM'05 Proceedings of the 6th international conference on Parallel Processing and Applied Mathematics
Year:
2005

Citing 6
Cited 1

Effective cache prefetching on bus-based multiprocessors

ACM Transactions on Computer Systems (TOCS)
Data Forwarding in Scalable Shared-Memory Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
Networks on Chips: A New SoC Paradigm

Computer
Cache Injection: A Novel Technique for Tolerating Memory Latency in Bus-Based SMPs

Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
Dynamic SMP Clusters with Communication on the Fly in NoC Technology for Very Fine Grain Computations

ISPDC '04 Proceedings of the Third International Symposium on Parallel and Distributed Computing/Third International Workshop on Algorithms, Models and Tools for Parallel Computing on Heterogeneous Networks
Program Graph Scheduling for Dynamic SMP Clusters with Communication on the Fly

ISPDC '04 Proceedings of the Third International Symposium on Parallel and Distributed Computing/Third International Workshop on Algorithms, Models and Tools for Parallel Computing on Heterogeneous Networks

Parallel matrix multiplication based on dynamic SMP clusters in SoC technology

ISPA'07 Proceedings of the 2007 international conference on Frontiers of High Performance Computing and Networking

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper evaluates new architectural solutions for data communication in shared memory parallel systems. These solutions enable creation of run-time reconfigurable processor clusters with very efficient inter-processor data exchange. It makes that data brought in the data cache of a processor, which enters a cluster, can be transparently intercepted by many processors in the cluster. Direct communication between processor caches is possible, which eliminates standard data transactions. The system provides simultaneous connections of processors with many memory modules that further increases the potential for parallel inter-cluster data exchange. System on chip technology is applied. Special program macro-data flow graphs enable proper structuring of program execution control, including specification of parallel execution, data cache operations, switching processors between clusters and multiple parallel reads of data on the fly. Simulation results from symbolic execution of graphs of fine grain numerical algorithms illustrate high efficiency and suitability of the proposed architecture for massively parallel fine-grain numerical computations.