Performance under Failures of MapReduce Applications

Authors:
Hui Jin;Kan Qiao;Xian-He Sun;Ying Li
Affiliations:
-;-;-;-
Venue:
CCGRID '11 Proceedings of the 2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing
Year:
2011

Citing 0
Cited 3

Failure scenario as a service (FSaaS) for Hadoop clusters

Proceedings of the Workshop on Secure and Dependable Middleware for Cloud Monitoring and Management
A characteristic study on failures of production distributed data-parallel programs

Proceedings of the 2013 International Conference on Software Engineering
Performance comparison under failures of MPI and MapReduce: An analytical approach

Future Generation Computer Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

The MapReduce programming paradigm is gaining more and more popularity in recent years due to its ability in supporting easy programming, data distribution, as well as fault tolerance. Failure is an unwanted but inevitable fact that all large-scale parallel computing systems have to face with. MapReduce introduces a novel data replication and task reexecution strategy for fault tolerance. This study intends to lead a better understanding of such fault tolerance mechanisms. In particular, we build a stochastic performance model to quantify the impact of failures on MapReduce applications and to investigate its effectiveness under different computing environments. Simulations also have been carried out to verify the accuracy of the proposed model. Our results show that data replication is an effective approach even when failure rate is high, and the task migration mechanism of MapReduce works well in balancing the reliability difference among individual nodes. This work provides a theoretical foundation for optimizing large-scale MapReduce applications, especially when fault tolerance is the concern.