High-performance computer architecture
High-performance computer architecture
Algorithms for matrix transposition on Boolean N-cube configured ensemble architecture
SIAM Journal on Matrix Analysis and Applications
Optimum Broadcasting and Personalized Communication in Hypercubes
IEEE Transactions on Computers
Optimal communication algorithms for hypercubes
Journal of Parallel and Distributed Computing
The Stanford Dash Multiprocessor
Computer
Rearrangeable circuit-switched hypercube architectures for routing permutations
Journal of Parallel and Distributed Computing
The Stanford FLASH multiprocessor
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Integration of message passing and shared memory in the Stanford FLASH multiprocessor
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
On characterizing bandwidth requirements of parallel applications
Proceedings of the 1995 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Numerical Recipes in C: The Art of Scientific Computing
Numerical Recipes in C: The Art of Scientific Computing
Multiphase Complete Exchange on Paragon, SP2, and CS-2
IEEE Parallel & Distributed Technology: Systems & Technology
Synchronizing Hypercube Networks in the Presence of Faults
IEEE Transactions on Computers
Latency Hiding in Message-Passing Architectures
Proceedings of the 8th International Symposium on Parallel Processing
Cache write generate for parallel image processing on shared memory architectures
IEEE Transactions on Image Processing
Hi-index | 0.00 |
One of the major goals in the design of parallel processing machines and algorithms is to achieve robustness and reduce the effects of the overhead introduced when a given problem is parallelized or a fault occurs. A key contributor to overhead is communication time, in particular when a node is faulty and another node is substuiting for its operation. Many architectures try to reduce this overhead by minimizing the actual time for a communication to occur, including latency and bandwidth figures. Another approach is to hide communication by overlapping it with computation assuming that the computation is the most prominent factor. This paper presents the mechanisms provided in the Proteus parallel computer and its effective use of communication hiding through overlapping communication/computation techniques with and without the presence of a fault. These techniques are easily extended for use in compiler support of parallel programming. We also address the complexity (or rather simplicity) in achieving complete exchange on the Proteus Machine.