Fault tolerant parallel data-intensive algorithms

Authors:
Mucahid Kutlu;Gagan Agrawal;Oguz Kurt
Affiliations:
Ohio State University, Columbus, OH, USA;Ohio State University, Columbus, OH, USA;Ohio State University, Columbus, OH, USA
Venue:
Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
Year:
2012

Citing 2
Cited 1

MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Toward Exascale Resilience

International Journal of High Performance Computing Applications

kMemvisor: flexible system wide memory mirroring in virtual environments

Proceedings of the 22nd international symposium on High-performance parallel and distributed computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Fault-tolerance is rapidly becoming a crucial issue in high-end and distributed computing, as increasing number of cores are decreasing the mean-time to failure of the systems. In this work, we present an algorithm-based fault tolerance solution that handles fail-stop failures for a class of iterative data intensive algorithms. We intelligently replicate the data to minimize data loss in multiple failures and decrease re-execution in recovery by little modifications in the algorithms. We evaluate our approach by using two data mining algorithms, kmeans and Apriori. We show that our approach has negligible overhead and allows us to gracefully handle different number of failures. In addition, our approach outperforms Hadoop both in absence and presence of failures.