Task Allocation for Maximizing Reliability of Distributed Computer Systems
IEEE Transactions on Computers
IEEE Transactions on Parallel and Distributed Systems
Task Allocation Algorithms for Maximizing Reliability of Distributed Computing Systems
IEEE Transactions on Computers
Static scheduling algorithms for allocating directed task graphs to multiprocessors
ACM Computing Surveys (CSUR)
Performance-Effective and Low-Complexity Task Scheduling for Heterogeneous Computing
IEEE Transactions on Parallel and Distributed Systems
IEEE Transactions on Parallel and Distributed Systems
Computers and Intractability: A Guide to the Theory of NP-Completeness
Computers and Intractability: A Guide to the Theory of NP-Completeness
IEEE Transactions on Parallel and Distributed Systems
Experimental Assessment of Workstation Failures and Their Impact on Checkpointing Systems
FTCS '98 Proceedings of the The Twenty-Eighth Annual International Symposium on Fault-Tolerant Computing
A Task Duplication Based Scheduling Algorithm for Heterogeneous Systems
IPDPS '00 Proceedings of the 14th International Symposium on Parallel and Distributed Processing
ICPP '00 Proceedings of the 2000 International Workshop on Parallel Processing
An Incremental Genetic Algorithm Approach to Multiprocessor Scheduling
IEEE Transactions on Parallel and Distributed Systems
Network-Aware Operator Placement for Stream-Processing Systems
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Bi-objective scheduling algorithms for optimizing makespan and reliability on heterogeneous systems
Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Mapping pipeline skeletons onto heterogeneous platforms
Journal of Parallel and Distributed Computing
A Decentralized and Cooperative Workflow Scheduling Algorithm
CCGRID '08 Proceedings of the 2008 Eighth IEEE International Symposium on Cluster Computing and the Grid
Reliability versus performance for critical applications
Journal of Parallel and Distributed Computing
Optimizing the Latency of Streaming Applications under Throughput and Reliability Constraints
ICPP '09 Proceedings of the 2009 International Conference on Parallel Processing
Job Admission and Resource Allocation in Distributed Streaming Systems
Job Scheduling Strategies for Parallel Processing
Optimizing End-to-end Performance of Distributed Applications with Linear Computing Pipelines
ICPADS '09 Proceedings of the 2009 15th International Conference on Parallel and Distributed Systems
Cooperative and decentralized workflow scheduling in global grids
Future Generation Computer Systems
Scalable fault tolerant protocol for parallel runtime environments
EuroPVM/MPI'06 Proceedings of the 13th European PVM/MPI User's Group conference on Recent advances in parallel virtual machine and message passing interface
IWDC'05 Proceedings of the 7th international conference on Distributed Computing
Hi-index | 0.00 |
With the advent of next-generation scientific applications, the workflow approach that integrates various computing and networking technologies has provided a viable solution to managing and optimizing large-scale distributed data transfer, processing, and analysis. This paper investigates a problem of mapping distributed scientific workflows for maximum throughput in faulty networks where nodes and links are subject to probabilistic failures. We formulate this problem as a bi-objective optimization problem to maximize both throughput and reliability. By adapting and modifying a centralized fault-free workflow mapping scheme, we propose a new mapping algorithm to achieve high throughput for smooth data flow in a distributed manner while satisfying a pre-specified bound of the overall failure rate for a guaranteed level of reliability. The performance superiority of the proposed solution is illustrated by both extensive simulation-based comparisons with existing algorithms and experimental results from a real-life scientific workflow deployed in wide-area networks.