Failure resilient real-time data federation system

Authors:
Aakanksha Gagrani;Brijesh Pillai;Srikumar Krishnamoorthy
Affiliations:
Infosys Technologies Pvt. Ltd.;Infosys Technologies Pvt. Ltd.;Infosys Technologies Pvt. Ltd.
Venue:
SpringSim '09 Proceedings of the 2009 Spring Simulation Multiconference
Year:
2009

Citing 10
Cited 0

Distributed Shared Memory: A Survey of Issues and Algorithms

Computer - Distributed computing systems: separate resources acting as one
A Survey of Distributed Database Checkpointing

Distributed and Parallel Databases
The state of the art in distributed query processing

ACM Computing Surveys (CSUR)
Simple object access protocol (SOAP) and Web services

ICSE '01 Proceedings of the 23rd International Conference on Software Engineering
Fault-Tolerance Using Cache-Coherent Distributed Shared Memory Systems

ISPAN '99 Proceedings of the 1999 International Symposium on Parallel Architectures, Algorithms and Networks
Checkpointing-based rollback recovery for parallel applications on the InteGrade grid middleware

MGC '04 Proceedings of the 2nd workshop on Middleware for grid computing
A planning based approach to failure recovery in distributed systems

WOSS '04 Proceedings of the 1st ACM SIGSOFT workshop on Self-managed systems
Fault-Tolerance in Distributed Query Processing

IDEAS '05 Proceedings of the 9th International Database Engineering & Application Symposium
Integrating coherency and recoverability in distributed systems

OSDI '94 Proceedings of the 1st USENIX conference on Operating Systems Design and Implementation
A replication-based fault tolerance protocol using group communication for the grid

ISPA'06 Proceedings of the 4th international conference on Parallel and Distributed Processing and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Data federation systems virtualize access to enterprize data resources by integrating data from disparate and heterogeneous operational data sources in an on-demand and real-time basis. The key challenge of low latency data access in such real-time data federation systems can be addressed by grid based scale-out architecture. However, failure of resources in the grid can pose serious challenges in data federation as the query processing is federated over multiple grid nodes. In such real-time data federation systems, it is often desirable to recover from failure and continue operation rather than repeat the entire process. This paper proposes a decentralized failure-recovery protocol for data federation system using data spaces based architecture. The generic nature of the protocol makes it extensible to applications other than data federation system as well. Moreover, the protocol does not make any assumptions about the availability of any central repository for recovering from failure. We implement the proposed failure recovery protocol in a simulation environment and present the key findings of the study.