Task Scheduling Algorithm for Multicore Processor System for Minimizing Recovery Time in Case of Single Node Fault

Authors:
Shohei Gotoda;Minoru Ito;Naoki Shibata
Affiliations:
-;-;-
Venue:
CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Year:
2012

Citing 10
Cited 1

A Performance Evaluation of CP List Scheduling Heuristics for Communication Intensive Task Graphs

IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Communication Contention in Task Scheduling

IEEE Transactions on Parallel and Distributed Systems
Toward a Realistic Task Scheduling Model

IEEE Transactions on Parallel and Distributed Systems
Task Scheduling for Parallel Systems (Wiley Series on Parallel and Distributed Computing)

Task Scheduling for Parallel Systems (Wiley Series on Parallel and Distributed Computing)
SODA: an optimizing scheduler for large-scale stream-based distributed computer systems

Proceedings of the 9th ACM/IFIP/USENIX International Conference on Middleware
An empirical study of high availability in stream processing systems

Proceedings of the 10th ACM/IFIP/USENIX International Conference on Middleware
Contention-Aware Scheduling with Task Duplication

Job Scheduling Strategies for Parallel Processing
A Data Distribution Aware Task Scheduling Strategy for MapReduce System

CloudCom '09 Proceedings of the 1st International Conference on Cloud Computing
Buffer-space efficient and deadlock-free scheduling of stream applications on multi-core architectures

Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
A Hybrid Approach to High Availability in Stream Processing Systems

ICDCS '10 Proceedings of the 2010 IEEE 30th International Conference on Distributed Computing Systems

A solution for optimizing recovery time in cloud computing

Proceedings of the 8th International Conference on Ubiquitous Information Management and Communication

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we propose a task scheduling algorithm for a multicore processor system which reduces the recovery time in case of a single fail-stop failure of a multicore processor. Many of the recently developed processors have multiple cores on a single die, so that one failure of a computing node results in failure of many processors. In the case of a failure of a multicore processor, all tasks which have been executed on the failed multicore processor have to be recovered at once. The proposed algorithm is based on an existing check pointing technique, and we assume that the state is saved when nodes send results to the next node. If a series of computations that depends on former results is executed on a single die, we need to execute all parts of the series of computations again in the case of failure of the processor. The proposed scheduling algorithm tries not to concentrate tasks to processors on a die. We designed our algorithm as a parallel algorithm that achieves O(n) speedup where n is the number of processors. We evaluated our method using simulations and experiments with four PCs. We compared our method with existing scheduling method, and in the simulation, the execution time including recovery time in the case of a node failure is reduced by up to 50% while the overhead in the case of no failure was a few percent in typical scenarios.