The Stanford Dash Multiprocessor
Computer
Effective cache prefetching on bus-based multiprocessors
ACM Transactions on Computer Systems (TOCS)
Data Forwarding in Scalable Shared-Memory Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
Parallel programming: techniques and applications using networked workstations and parallel computers
Advanced Computer Architectures
Advanced Computer Architectures
IWCC '01 Proceedings of the NATO Advanced Research Workshop on Advanced Environments, Tools, and Applications for Cluster Computing-Revised Papers
A Parallel System Architecture Based on Dynamically Configurable Shared Memory Clusters
PPAM '01 Proceedings of the th International Conference on Parallel Processing and Applied Mathematics-Revised Papers
Cache Injection: A Novel Technique for Tolerating Memory Latency in Bus-Based SMPs
Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
A survey of distributed shared memory systems
HICSS '95 Proceedings of the 28th Hawaii International Conference on System Sciences
An Architecture based on the Memory Mapped Node Addressing in Reconfigurable Interconnection Network
PAS '97 Proceedings of the 2nd AIZU International Symposium on Parallel Algorithms / Architecture Synthesis
Iterative reconstruction of tomographic scans in dynamic SMP clusters with communication on the fly
PPAM'05 Proceedings of the 6th international conference on Parallel Processing and Applied Mathematics
Hi-index | 0.00 |
Efficient architectural solutions for systems based on shared memory processor clusters are presented in the paper. In the proposed architecture, processors can be dynamically switched between bus-based SMP clusters at program run-time. A switched processor can bring data in its cache that can be read on the fly by processors in the cluster when written into the cluster memory. This new inter-cluster data transfer paradigm is called communication on the fly. For execution in the proposed architecture, programs are structured accordingly to macro-data flow graphs in which task composition and communication are so defined, as to eliminate reloading of data caches during task execution. An extended macro-data flow graph representation is presented in the paper. It enables modeling of program execution control in the system including parallel task execution, data cache functioning, data bus arbiters, switching processors between clusters and multiple parallel reads of data on the fly. Simulation results for a very fine-grained parallel numerical example are presented.