Compiler-Assisted Synthesis of Algorithm-Based Checking in Multiprocessors
IEEE Transactions on Computers
Supercompilers for parallel and vector computers
Supercompilers for parallel and vector computers
Algorithm-Based Fault Tolerance on a Hypercube Multiprocessor
IEEE Transactions on Computers
Compiling Fortran D for MIMD distributed-memory machines
Communications of the ACM
The analysis and synthesis of efficient algorithm-based error detection schemes for hypercube multiprocessors
The high performance Fortran handbook
The high performance Fortran handbook
Loop Parallelization
Algorithm-Based Diskless Checkpointing for Fault-Tolerant Matrix Operations
FTCS '95 Proceedings of the Twenty-Fifth International Symposium on Fault-Tolerant Computing
Algorithm-Based Fault Tolerance for Matrix Operations
IEEE Transactions on Computers
Efficient Self-Recovering ASIC Design
IEEE Design & Test
Hi-index | 0.00 |
We have developed an automated a compile time approach to generating error-detecting parallel programs. The compiler is used to identify statements implementing affine transformations within the program and to automatically insert code for computing, manipulating, and comparing checksums in order to detect data errors at runtime. Statements which do not implement affine transformations are checked by duplication. Checksums are reused from one loop to the next if this is possible, rather than recomputing checksums for every statement. A global dataflow analysis is performed in order to determine points at which checksums need to be recomputed. We also use a novel method of specifying the data distributions of the check data using data distribution directives so that the computations on the original data, and the corresponding check computations are performed on different processors. Results on the time overhead and error coverage of the error detecting parallel programs over the original programs are presented on an Intel Paragon distributed memory multicomputer.