Characterizing result errors in internet desktop grids

Authors:
Derrick Kondo;Filipe Araujo;Paul Malecot;Patricio Domingues;Luis Moura Silva;Gilles Fedak;Franck Cappello
Affiliations:
INRIA Futurs, France;University of Coimbra, Portugal;INRIA Futurs, France;Polytechnic Institute of Leiria, Portugal;University of Coimbra, Portugal;INRIA Futurs, France;INRIA Futurs, France
Venue:
Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing
Year:
2007

Citing 6
Cited 7

Sabotage-Tolerance Mechanisms for Volunteer Computing Systems

CCGRID '01 Proceedings of the 1st International Symposium on Cluster Computing and the Grid
XtremWeb: A Generic Global Computing System

CCGRID '01 Proceedings of the 1st International Symposium on Cluster Computing and the Grid
Entropia: architecture and performance of an enterprise desktop grid system

Journal of Parallel and Distributed Computing - Special issue on computational grids
BOINC: A System for Public-Resource Computing and Storage

GRID '04 Proceedings of the 5th IEEE/ACM International Workshop on Grid Computing
Homogeneous Redundancy: a Technique to Ensure Integrity of Molecular Simulation Results Using Public Computing

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 1 - Volume 02
The Computational and Storage Potential of Volunteer Computing

CCGRID '06 Proceedings of the Sixth IEEE International Symposium on Cluster Computing and the Grid

BitDew: a programmable environment for large-scale data management and distribution

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
A Probabilistic Fault-Tolerant Recovery Mechanism for Task and Result Certification of Large-Scale Distributed Applications

GPC '09 Proceedings of the 4th International Conference on Advances in Grid and Pervasive Computing
Collusion Detection for Grid Computing

CCGRID '09 Proceedings of the 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid
GridBot: execution of bags of tasks in multiple grids

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Generalized Spot-Checking for Sabotage-Tolerance in Volunteer Computing Systems

CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
Modeling and tolerating heterogeneous failures in large parallel systems

Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Brief announcement: algorithmic mechanisms for internet-based computing under unreliable communication

DISC'11 Proceedings of the 25th international conference on Distributed computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Desktop grids use the free resources in Intranet and Internet environments for large-scale computation and storage. While desktop grids offer a high return on investment, one critical issue is the validation of results returned by participating hosts. Several mechanisms for result validation have been previously proposed. However, the characterization of errors is poorly understood. To study error rates, we implemented and deployed a desktop grid application across several thousand hosts distributed over the Internet. We then analyzed the results to give quantitative and empirical characterization of errors stemming from input or output (I/O) failures. We find that in practice, error rates are widespread across hosts but occur relatively infrequently. Moreover, we find that error rates tend to not be stationary over time nor correlated between hosts. In light of these characterization results, we evaluated state-of-the-art error detection mechanisms and describe the trade-offs for using each mechanism.