Static and dynamic processor scheduling disciplines in heterogeneous parallel architectures
Journal of Parallel and Distributed Computing
Theory of Modeling and Simulation
Theory of Modeling and Simulation
SETI@home: an experiment in public-resource computing
Communications of the ACM
Experiences with predicting resource performance on-line in computational grid settings
ACM SIGMETRICS Performance Evaluation Review
Data Staging Effects in Wide Area Task Farming Applications
CCGRID '01 Proceedings of the 1st International Symposium on Cluster Computing and the Grid
Dynamic Matching and Scheduling of a Class of Independent Tasks onto Heterogeneous Computing Systems
HCW '99 Proceedings of the Eighth Heterogeneous Computing Workshop
Heuristics for Scheduling Parameter Sweep Applications in Grid Environments
HCW '00 Proceedings of the 9th Heterogeneous Computing Workshop
A Resource Query Interface for Network-Aware Applications
HPDC '98 Proceedings of the 7th IEEE International Symposium on High Performance Distributed Computing
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Resource Management for Rapid Application Turnaround on Enterprise Desktop Grids
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
NodeWiz: peer-to-peer resource discovery for grids
CCGRID '05 Proceedings of the Fifth IEEE International Symposium on Cluster Computing and the Grid - Volume 01
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Exploiting replication and data reuse to efficiently schedule data-intensive applications on grids
JSSPP'04 Proceedings of the 10th international conference on Job Scheduling Strategies for Parallel Processing
Improving speedup and response times by replicating parallel programs on a SNOW
JSSPP'04 Proceedings of the 10th international conference on Job Scheduling Strategies for Parallel Processing
Measuring bandwidth between planetlab nodes
PAM'05 Proceedings of the 6th international conference on Passive and Active Network Measurement
P2P file sharing for P2P computing
Multiagent and Grid Systems - Content management and delivery through P2P-based content networks
Workload balancing and throughput optimization for heterogeneous systems subject to failures
Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part I
Business-driven short-term management of a hybrid IT infrastructure
Journal of Parallel and Distributed Computing
Self-Healing of Operational Workflow Incidents on Distributed Computing Infrastructures
CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Coordinated rescheduling of Bag-of-Tasks for executions on multiple resource providers
Concurrency and Computation: Practice & Experience
Future Generation Computer Systems
A User-Based Model of Grid Computing Workloads
GRID '12 Proceedings of the 2012 ACM/IEEE 13th International Conference on Grid Computing
Assessing Green Strategies in Peer-to-Peer Opportunistic Grids
Journal of Grid Computing
Scheduling linear chain streaming applications on heterogeneous systems with failures
Future Generation Computer Systems
Effective straggler mitigation: attack of the clones
nsdi'13 Proceedings of the 10th USENIX conference on Networked Systems Design and Implementation
Self-healing of workflow activity incidents on distributed computing infrastructures
Future Generation Computer Systems
GRASS: trimming stragglers in approximation analytics
NSDI'14 Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation
Hi-index | 0.00 |
Large distributed systems challenge traditional schedulers, as it is often hard to determine a priori how long each task will take to complete on each resource, information that is input for such schedulers. Task replication has been applied in a variety of scenarios as a way to circumvent this problem. Task replication consists of dispatching multiple replicas of a task and using the result from the first replica to finish. Replication schedulers (i.e. schedulers that employ task replication) are able to achieve good performance even in the absence of information on tasks and resources. They are also of smaller complexity than traditional schedulers, making them better suitable for large distributed systems. On the other hand, replication schedulers waste cycles with the replicas that are not the first to finish. Moreover, this extra consumption of resources raises severe concerns about the system-wide performance of a distributed system with multiple, competing replication schedulers. This paper presents a comprehensive study of task replication, comparing replication schedulers against traditional information-based schedulers, and establishing their efficacy (the performance delivered to the application), efficiency (the amount of resources wasted), and emergent behavior (the system-wide behavior of a system with multiple replication schedulers). We also introduce a simple access control strategy that can be implemented locally by each resource and greatly improves overall performance of a system on which multiple replication schedulers compete for resources.