On Polynomial-Time Testable Combinational Circuits
IEEE Transactions on Computers
Computational Complexity Issues in Operative Diagnostics of Graph-Based Systems
IEEE Transactions on Computers
The Complexity of Fault Detection Problems for Combinational Logic Circuits
IEEE Transactions on Computers
The International Exascale Software Project roadmap
International Journal of High Performance Computing Applications
Hi-index | 0.00 |
Exascale systems built using multi-core processors are expected to experience several component faults during code executions lasting for hours. It is important to detect faults in processor cores so that faulty cores can be removed from scheduler pools, nodes with high failures can be swapped out, applications can be migrated, and check-point recoveries can be initiated. We propose light-weight codes that utilize chaotic computations and customized threads to detect component faults in multi-core processors. They concurrently execute dedicated threads that implement Poincare and identity maps, which are customized to isolate faults in arithmetic operations, memory elements and interconnects. The instruction execution errors and local memory errors are detected by threads dedicated to processor cores, and errors in inter-processor crossconnects are detected by global-local memory movements. We present preliminary implementation results on 4- and 48-core HP workstations under simulated faults.