Exploring the design space for a shared-cache multiprocessor
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Effective cache prefetching on bus-based multiprocessors
ACM Transactions on Computer Systems (TOCS)
Increasing cache port efficiency for dynamic superscalar microprocessors
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Data Forwarding in Scalable Shared-Memory Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
A scalable parallel Strassen's matrix multiplication algorithm for distributed-memory computers
SAC '95 Proceedings of the 1995 ACM symposium on Applied computing
Cache Injection: A Novel Technique for Tolerating Memory Latency in Bus-Based SMPs
Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
ISPDC '04 Proceedings of the Third International Symposium on Parallel and Distributed Computing/Third International Workshop on Algorithms, Models and Tools for Parallel Computing on Heterogeneous Networks
ISPDC '05 Proceedings of the The 4th International Symposium on Parallel and Distributed Computing
Interconnect-Centric Design for Advanced SOC and NOC
Interconnect-Centric Design for Advanced SOC and NOC
Dynamic SMP clusters in soc technology – towards massively parallel fine grain numerics
PPAM'05 Proceedings of the 6th international conference on Parallel Processing and Applied Mathematics
Scheduling task graphs for execution in dynamic SMP clusters with bounded number of resources
PPAM'05 Proceedings of the 6th international conference on Parallel Processing and Applied Mathematics
PPAM'09 Proceedings of the 8th international conference on Parallel processing and applied mathematics: Part II
Hi-index | 0.00 |
The paper concerns a special architecture of dynamic shared memory processor (SMP) clusters organized at program run-time. In this architecture, designed for implementation in System on Chip technology, a new mechanism of the communication on the fly is provided. It is a combination of dynamic processor switching between SMP clusters and parallel data reads on the fly. This mechanism enables direct communication between processor data caches and eliminates many data transactions on memory busses. The paper presents the principles of the new architecture and evaluates its efficiency for execution of matrix multiplication with recursive matrix decomposition into quarters. The evaluation is done by simulation experiments with symbolic execution of parallel program graphs with different parallelization grain.