Parallel matrix multiplication based on dynamic SMP clusters in SoC technology

Authors:
Marek Tudruj;Lukasz Masko
Affiliations:
Institute of Computer Science, Polish Academy of Sciences, Warsaw, Poland and Polish-Japanese Institute of Information Technology, Warsaw, Poland;Institute of Computer Science, Polish Academy of Sciences, Warsaw, Poland
Venue:
ISPA'07 Proceedings of the 2007 international conference on Frontiers of High Performance Computing and Networking
Year:
2007

Citing 12
Cited 1

Exploring the design space for a shared-cache multiprocessor

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Effective cache prefetching on bus-based multiprocessors

ACM Transactions on Computer Systems (TOCS)
Increasing cache port efficiency for dynamic superscalar microprocessors

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Data Forwarding in Scalable Shared-Memory Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
A scalable parallel Strassen's matrix multiplication algorithm for distributed-memory computers

SAC '95 Proceedings of the 1995 ACM symposium on Applied computing
Networks on Chips: A New SoC Paradigm

Computer
Cache Injection: A Novel Technique for Tolerating Memory Latency in Bus-Based SMPs

Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
Dynamic SMP Clusters with Communication on the Fly in NoC Technology for Very Fine Grain Computations

ISPDC '04 Proceedings of the Third International Symposium on Parallel and Distributed Computing/Third International Workshop on Algorithms, Models and Tools for Parallel Computing on Heterogeneous Networks
Towards Massively Parallel Numerical Computations Based on Dynamic SMP Clusters with Communication on the Fly

ISPDC '05 Proceedings of the The 4th International Symposium on Parallel and Distributed Computing
Interconnect-Centric Design for Advanced SOC and NOC

Interconnect-Centric Design for Advanced SOC and NOC
Dynamic SMP clusters in soc technology – towards massively parallel fine grain numerics

PPAM'05 Proceedings of the 6th international conference on Parallel Processing and Applied Mathematics
Scheduling task graphs for execution in dynamic SMP clusters with bounded number of resources

PPAM'05 Proceedings of the 6th international conference on Parallel Processing and Applied Mathematics

Comparison of program task scheduling algorithms for dynamic SMP clusters with communication on the fly

PPAM'09 Proceedings of the 8th international conference on Parallel processing and applied mathematics: Part II

Quantified Score

Hi-index	0.00

Visualization

Abstract

The paper concerns a special architecture of dynamic shared memory processor (SMP) clusters organized at program run-time. In this architecture, designed for implementation in System on Chip technology, a new mechanism of the communication on the fly is provided. It is a combination of dynamic processor switching between SMP clusters and parallel data reads on the fly. This mechanism enables direct communication between processor data caches and eliminates many data transactions on memory busses. The paper presents the principles of the new architecture and evaluates its efficiency for execution of matrix multiplication with recursive matrix decomposition into quarters. The evaluation is done by simulation experiments with symbolic execution of parallel program graphs with different parallelization grain.