Fault Tolerant Parallel FFT Using Parallel Failure Recovery

Authors:
Hongyi Fu;Xuejun Yang
Affiliations:
-;-
Venue:
ICCSA '09 Proceedings of the 2009 International Conference on Computational Science and Its Applications
Year:
2009

Citing 0
Cited 1

A study of application-level recovery methods for transient network faults

ScalA '13 Proceedings of the Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper introduces a new method based on parallel failure recovery, for the fault tolerance issue of parallel programs. In case a process fails, other surviving processes will compute the task of the failed one in parallel, so that the overhead for fault tolerance is leveled down. The paper presents the design and implementation of the parallel FFT using the new approach, and works on finding an optimum number of processes that participate in parallel failure recovery. Finally, an experiment is done to show the better performance of the parallel failure recovery over that of checkpointing, and to show the effectiveness of our solution for the best number of processes participating parallel failure recovery.