The co-replication methodology and its application to structured parallel programs

Authors:
Carlo Bertolli;Massimo Coppola;Corrado Zoccolo
Affiliations:
University of Pisa, Pisa, Italy;University of Pisa/Institute of Information Science and Technologies, Pisa, Italy;IAC Search & Media Italia S.r.l., Pisa, Italy
Venue:
Proceedings of the 2007 symposium on Component and framework technology in high-performance and scientific computing
Year:
2007

Citing 15
Cited 3

Reliable communication in the presence of failures

ACM Transactions on Computer Systems (TOCS)
Gossiping in minimal time

SIAM Journal on Computing
Cilk: an efficient multithreaded runtime system

Journal of Parallel and Distributed Computing - Special issue on multithreading for multiprocessors
SkIE: a heterogeneous environment for HPC applications

Parallel Computing - Special Anniversary issue
ATLAS: an infrastructure for global computing

EW 7 Proceedings of the 7th workshop on ACM SIGOPS European workshop: Systems support for worldwide applications
Gossip-Style Failure Detection and Distributed Consensus for Scalable Heterogeneous Clusters

Cluster Computing
Software-Based Replication for Fault Tolerance

Computer
An Enabling Framework for Master-Worker Applications on the Computational Grid

HPDC '00 Proceedings of the 9th IEEE International Symposium on High Performance Distributed Computing
DARX—A Framework For The Fault-Tolerant Support Of Agent Software

ISSRE '03 Proceedings of the 14th International Symposium on Software Reliability Engineering
Bringing skeletons out of the closet: a pragmatic manifesto for skeletal parallel programming

Parallel Computing
Total order broadcast and multicast algorithms: Taxonomy and survey

ACM Computing Surveys (CSUR)
Filtering Failure Logs for a BlueGene/L Prototype

DSN '05 Proceedings of the 2005 International Conference on Dependable Systems and Networks
Algorithmic skeletons meeting grids

Parallel Computing - Algorithmic skeletons
Adaptive and reliable parallel computing on networks of workstations

ATEC '97 Proceedings of the annual conference on USENIX Annual Technical Conference
Parallelization of C# Programs Through Annotations

ICCS '07 Proceedings of the 7th international conference on Computational Science, Part II

Towards software component assembly language enhanced with workflows and skeletons

Proceedings of the 2008 compFrame/HPC-GECO workshop on Component based high performance
Stkm on Sca: A Unified Framework with Components, Workflows and Algorithmic Skeletons

Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
Using allopoietic agents in replicated software to respond to errors, faults, and attacks

Proceedings of the 48th Annual Southeast Regional Conference

Quantified Score

Hi-index	0.00

Visualization

Abstract

We introduce Co-Replication, a technique exploiting abstract properties of a computation to allow parallel replicas of a software module to cooperate, enhancing both the reliability and availability of the resulting component, and providing a flexible trade-off among the two properties. In Co-Replication a complete partial ordering is defined on the computation state. The formal expression of the state combination operation among replicas allows them to compute independently as a co-algorithm, and to exploit low-overhead, opportunistic strategies for spreading results and surviving to faults. Co-Replication suits structured parallel and component based programming, as it needs a high level description of the computation properties, and thus can ease exploitation ofnon fault-free, parallel platforms like large clusters and Grids. We describe the theoretical foundations of Co-Replication, and investigate the use of random gossiping strategies for the state combination. To show the applicability of the technique, we discuss the modelization of Master-Slave and task farm computations, and report test results over two applications.