The Stanford Dash Multiprocessor
Computer
PYRROS: static task scheduling and code generation for message passing multiprocessors
ICS '92 Proceedings of the 6th international conference on Supercomputing
Effective cache prefetching on bus-based multiprocessors
ACM Transactions on Computer Systems (TOCS)
IEEE Transactions on Parallel and Distributed Systems
Data Forwarding in Scalable Shared-Memory Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
Advanced Computer Architectures
Advanced Computer Architectures
A Parallel System Architecture Based on Dynamically Configurable Shared Memory Clusters
PPAM '01 Proceedings of the th International Conference on Parallel Processing and Applied Mathematics-Revised Papers
Cache Injection: A Novel Technique for Tolerating Memory Latency in Bus-Based SMPs
Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
A survey of distributed shared memory systems
HICSS '95 Proceedings of the 28th Hawaii International Conference on System Sciences
A Parallel System Architecture Based on Dynamically Configurable Shared Memory Clusters
PPAM '01 Proceedings of the th International Conference on Parallel Processing and Applied Mathematics-Revised Papers
Hi-index | 0.00 |
The paper presents a proposal of task scheduling algorithm for a multi-processor system based on dynamically organised shared memory processor clusters. A cluster contains processors with data caches connected to a data memory module by an internal cluster bus. Each data memory module is also accessible for a global inter-cluster bus that is available for all processors. Execution of tasks in a processor is done according to a specific macro dataflow model. It allows task execution only if all the required data have been loaded into processor data cache. The task scheduling algorithm defines mapping of program tasks into dynamic processor clusters on the basis of a program graph analysis. A program is represented by a macro dataflow graph extended by representation of actions of bus arbiters, data caches and memory modules. The resulting dynamic structuring of processor clusters minimizes the parallel program execution time. The algorithm is based on a modified Dominant Sequence Clustering approach.