Fault tolerant parallel data-intensive algorithms

  • Authors:
  • Mucahid Kutlu;Gagan Agrawal;Oguz Kurt

  • Affiliations:
  • Ohio State University, Columbus, OH, USA;Ohio State University, Columbus, OH, USA;Ohio State University, Columbus, OH, USA

  • Venue:
  • Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Fault-tolerance is rapidly becoming a crucial issue in high-end and distributed computing, as increasing number of cores are decreasing the mean-time to failure of the systems. In this work, we present an algorithm-based fault tolerance solution that handles fail-stop failures for a class of iterative data intensive algorithms. We intelligently replicate the data to minimize data loss in multiple failures and decrease re-execution in recovery by little modifications in the algorithms. We evaluate our approach by using two data mining algorithms, kmeans and Apriori. We show that our approach has negligible overhead and allows us to gracefully handle different number of failures. In addition, our approach outperforms Hadoop both in absence and presence of failures.