Impossibility of distributed consensus with one faulty process
Journal of the ACM (JACM)
An analysis of factors affecting software reliability
Journal of Systems and Software
Implementing E-Transactions with Asynchronous Replication
IEEE Transactions on Parallel and Distributed Systems
On Distributed Computing Systems Reliability Analysis Under Program Execution Constraints
IEEE Transactions on Computers
Transparent Fault Tolerance for Web Services Based Architectures
Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
GridWorkflow: A Flexible Failure Handling Framework for the Grid
HPDC '03 Proceedings of the 12th IEEE International Symposium on High Performance Distributed Computing
Reliability Analysis of Grid Computing Systems
PRDC '02 Proceedings of the 2002 Pacific Rim International Symposium on Dependable Computing
A quantitative and qualitative analysis of factors affecting software processes
Journal of Systems and Software
FTWeb: A Fault Tolerant Infrastructure for Web Services
EDOC '05 Proceedings of the Ninth IEEE International EDOC Enterprise Computing Conference
Fault-tolerant grid services using primary-backup: feasibility and performance
CLUSTER '04 Proceedings of the 2004 IEEE International Conference on Cluster Computing
An effective cache replacement algorithm in transcoding-enabled proxies
The Journal of Supercomputing
Evaluating the reliability of computational grids from the end user's point of view
Journal of Systems Architecture: the EUROMICRO Journal
Software Reliability Engineering: A Roadmap
FOSE '07 2007 Future of Software Engineering
A Hierarchical Modeling and Analysis for Grid Service Reliability
IEEE Transactions on Computers
Multimedia Object Placement for Transparent Data Replication
IEEE Transactions on Parallel and Distributed Systems
Early prediction of software component reliability
Proceedings of the 30th international conference on Software engineering
Designing Fault Tolerant Web Services Using BPEL
ICIS '08 Proceedings of the Seventh IEEE/ACIS International Conference on Computer and Information Science (icis 2008)
Quality Prediction of Service Compositions through Probabilistic Model Checking
QoSA '08 Proceedings of the 4th International Conference on Quality of Software-Architectures: Models and Architectures
Reliability in grid computing systems
Concurrency and Computation: Practice & Experience - A Special Issue from the Open Grid Forum
Tolerating hardware device failures in software
Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
A Taxonomy and Survey of Cloud Computing Systems
NCM '09 Proceedings of the 2009 Fifth International Joint Conference on INC, IMS and IDC
Collaborative reliability prediction of service-oriented systems
Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 1
Characterizing cloud computing hardware reliability
Proceedings of the 1st ACM symposium on Cloud computing
Real-time distributed program reliability analysis
SPDP '93 Proceedings of the 1993 5th IEEE Symposium on Parallel and Distributed Processing
Architecture-based reliability prediction for service-oriented computing
Architecting Dependable Systems III
Making services fault tolerant
ISAS'06 Proceedings of the Third international conference on Service Availability
Hi-index | 0.00 |
Software@?s reliability in distributed systems has always been a major concern for all stake holders especially for application@?s vendors and its users. Various models have been produced to assess or predict reliability of large scale distributed applications including e-government, e-commerce, multimedia services, and end-to-end automotive solutions, but reliability issues with these systems still exists. Ensuring distributed system@?s reliability in turns requires examining reliability of each individual component or factors involved in enterprise distributed applications before predicting or assessing reliability of whole system, and Implementing transparent fault detection and fault recovery scheme to provide seamless interaction to end users. For this reason we have analyzed in detail existing reliability methodologies from viewpoint of examining reliability of individual component and explained why we still need a comprehensive reliability model for applications running in distributed system. In this paper we have described detailed technical overview of research done in recent years in analyzing and predicting reliability of large scale distributed applications in four parts. We first described some pragmatic requirements for highly reliable systems and highlighted significance and various issues of reliability in different computing environment such as Cloud Computing, Grid Computing, and Service Oriented Architecture. Then we elucidated certain possible factors and various challenges that are nontrivial for highly reliable distributed systems, including fault detection, recovery and removal through testing or various replication techniques. Later we scrutinize various research models which synthesize significant solutions to tackle possible factors and various challenges in predicting as well as measuring reliability of software applications in distributed systems. At the end of this paper we have discussed limitations of existing models and proposed future work for predicting and analyzing reliability of distributed applications in real environment in the light of our analysis.