The CDAG: a data structure for automatic parallelization for a multithreaded architecture
EUROMICRO-PDP'02 Proceedings of the 10th Euromicro conference on Parallel, distributed and network-based processing
Tailoring a self-distributing architecture to a cluster computer environment
EURO-PDP'00 Proceedings of the 8th Euromicro conference on Parallel and distributed processing
Hi-index | 0.00 |
For parallel and distributed systems to gain wider acceptance than they have to date, they must become significantly easier to program. Fundamentally, parallel programming is more difficult than sequential programming as long as data and computation must be distributed by the programmer. Cache Only Memory Architectures (COMAs) provide a Distributed Shared Memory (DSM) where data distribution is performed automatically and transparently. This paper generalizes this idea to achieve the same distribution for computation, thus arriving at an automatic and transparent form of scheduling. Once this is accomplished, parallel computers become approximately as easy to program as sequential computers. Indeed, it becomes possible to recompile a large class of "dusty decks" for parallel and distributed architectures. The approach proposed in this paper builds upon techniques originally developed for multithreaded and dataflow architectures. This necessitates some changes, and permits some optimizations, to the coherency protocols used to implement the underlying COMA.