On the efficacy, efficiency and emergent behavior of task replication in large distributed systems

  • Authors:
  • Walfredo Cirne;Francisco Brasileiro;Daniel Paranhos;Luís Fabrício W. Góes;William Voorsluys

  • Affiliations:
  • Universidade Federal de Campina Grande, Departamento de Sistemas e Computação, Brazil;Universidade Federal de Campina Grande, Departamento de Sistemas e Computação, Brazil;Universidade Federal de Campina Grande, Departamento de Sistemas e Computação, Brazil;Pontifícia Universidade Católica de Minas Gerais, Instituto de Informática, Brazil;Universidade Federal de Campina Grande, Departamento de Sistemas e Computação, Brazil

  • Venue:
  • Parallel Computing
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Large distributed systems challenge traditional schedulers, as it is often hard to determine a priori how long each task will take to complete on each resource, information that is input for such schedulers. Task replication has been applied in a variety of scenarios as a way to circumvent this problem. Task replication consists of dispatching multiple replicas of a task and using the result from the first replica to finish. Replication schedulers (i.e. schedulers that employ task replication) are able to achieve good performance even in the absence of information on tasks and resources. They are also of smaller complexity than traditional schedulers, making them better suitable for large distributed systems. On the other hand, replication schedulers waste cycles with the replicas that are not the first to finish. Moreover, this extra consumption of resources raises severe concerns about the system-wide performance of a distributed system with multiple, competing replication schedulers. This paper presents a comprehensive study of task replication, comparing replication schedulers against traditional information-based schedulers, and establishing their efficacy (the performance delivered to the application), efficiency (the amount of resources wasted), and emergent behavior (the system-wide behavior of a system with multiple replication schedulers). We also introduce a simple access control strategy that can be implemented locally by each resource and greatly improves overall performance of a system on which multiple replication schedulers compete for resources.