Bandwidth availability of multiple-bus multiprocessors
IEEE Transactions on Computers
Performance of multiple-bus interconnections for multiprocessors
Journal of Parallel and Distributed Computing
The Stanford Dash Multiprocessor
Computer
PYRROS: static task scheduling and code generation for message passing multiprocessors
ICS '92 Proceedings of the 6th international conference on Supercomputing
Effective cache prefetching on bus-based multiprocessors
ACM Transactions on Computer Systems (TOCS)
IEEE Transactions on Parallel and Distributed Systems
Data Forwarding in Scalable Shared-Memory Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
Advanced Computer Architectures
Advanced Computer Architectures
Cache Injection: A Novel Technique for Tolerating Memory Latency in Bus-Based SMPs
Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
A survey of distributed shared memory systems
HICSS '95 Proceedings of the 28th Hawaii International Conference on System Sciences
Bandwidth of Crossbar and Multiple-Bus Connections for Multiprocessors
IEEE Transactions on Computers
Embedded Cluster Computing through Dynamic Reconfigurability of Inter-Processor Connections
IWCC '01 Proceedings of the NATO Advanced Research Workshop on Advanced Environments, Tools, and Applications for Cluster Computing-Revised Papers
Dynamic SMP clusters with communication on the fly
ISPDC'03 Proceedings of the Second international conference on Parallel and distributed computing
Hi-index | 0.00 |
The paper presents proposals of a new architecture and respective task scheduling algorithms for a multi-processor system based on dynamically organised shared memory clusters. The clusters are organised around memory modules placed in a common address space. Each memory module can be accessed through a local cluster bus and a common inter-cluster bus. Execution of tasks in a processor is done according to a specific macro dataflow model. It allows task execution only if all data needed by a task have been loaded into processor data cache. The data cache pre-fetching and single assignment data move principle enable elimination of cache thrashing and cache coherence problem. An extended macro dataflow graph representation is introduced that enables modelling of data bus arbiters, memory modules and data caches in the system. A task scheduling algorithm is proposed that defines mapping of program tasks into dynamic processor clusters on the basis of a program graph analysis. The algorithm is based on a modified Dominant Sequence Clustering approach and defines such dynamic structuring of clusters that minimises program execution time.