Scheduling Tasks with Resource Requirements in Hard Real-Time Systems
IEEE Transactions on Software Engineering
A case for redundant arrays of inexpensive disks (RAID)
SIGMOD '88 Proceedings of the 1988 ACM SIGMOD international conference on Management of data
Genetic algorithms + data structures = evolution programs (3rd ed.)
Genetic algorithms + data structures = evolution programs (3rd ed.)
A Genetic Algorithm for Multiprocessor Scheduling
IEEE Transactions on Parallel and Distributed Systems
IEEE Transactions on Software Engineering
A genetic algorithm for resource-constrained scheduling
A genetic algorithm for resource-constrained scheduling
Workflow Management: Models, Methods, and Systems
Workflow Management: Models, Methods, and Systems
A Framework for Evaluating Storage System Dependability
DSN '04 Proceedings of the 2004 International Conference on Dependable Systems and Networks
FAB: building distributed enterprise disk arrays from commodity components
ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Value-maximizing deadline scheduling and its application to animation rendering
Proceedings of the seventeenth annual ACM symposium on Parallelism in algorithms and architectures
FAST '04 Proceedings of the 3rd USENIX Conference on File and Storage Technologies
Queue - Workflow Systems
Lessons and challenges in automating data dependability
Proceedings of the 11th workshop on ACM SIGOPS European workshop
Challenges in managing dependable data systems
ACM SIGMETRICS Performance Evaluation Review - Design, implementation, and performance of storage systems
Content Manager Backup/Recovery and High Availability: Strategies, Options, and Procedures (IBM Redbooks)
Total recall: system support for automated availability management
NSDI'04 Proceedings of the 1st conference on Symposium on Networked Systems Design and Implementation - Volume 1
Challenges in managing dependable data systems
ACM SIGMETRICS Performance Evaluation Review - Design, implementation, and performance of storage systems
Discrete control for safe execution of IT automation workflows
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Don't settle for less than the best: use optimization to make decisions
HOTOS'07 Proceedings of the 11th USENIX workshop on Hot topics in operating systems
Using utility to provision storage systems
FAST'08 Proceedings of the 6th USENIX Conference on File and Storage Technologies
Automated planners for storage provisioning and disaster recovery
IBM Journal of Research and Development
Traveling to Rome: a retrospective on the journey
ACM SIGOPS Operating Systems Review
Improving the responsiveness of internet services with automatic cache placement
Proceedings of the 4th ACM European conference on Computer systems
Smoke and mirrors: reflecting files at a geographically remote location without loss of performance
FAST '09 Proccedings of the 7th conference on File and storage technologies
Applying genetic algorithms to decision making in autonomic computing systems
ICAC '09 Proceedings of the 6th international conference on Autonomic computing
Scheduling cooperative emergency response: or how the meek shall overcome the greedy
Proceedings of the 2009 International Conference on Wireless Communications and Mobile Computing: Connecting the World Wirelessly
Disaster recovery as a cloud service: economic benefits & deployment challenges
HotCloud'10 Proceedings of the 2nd USENIX conference on Hot topics in cloud computing
Planning for optimal multi-site data distribution for disaster recovery
GECON'11 Proceedings of the 8th international conference on Economics of Grids, Clouds, Systems, and Services
A fast disaster recovery mechanism for volume replication systems
HPCC'07 Proceedings of the Third international conference on High Performance Computing and Communications
Hi-index | 0.00 |
Restoring data operations after a disaster is a daunting task: how should recovery be performed to minimize data loss and application downtime? Administrators are under considerable pressure to recover quickly, so they lack time to make good scheduling decisions. They schedule recovery based on rules of thumb, or on pre-determined orders that might not be best for the failure occurrence. With multiple workloads and recovery techniques, the number of possibilities is large, so the decision process is not trivial.This paper makes several contributions to the area of data recovery scheduling. First, we formalize the description of potential recovery processes by defining recovery graphs. Recovery graphs explicitly capture alternative approaches for recovering workloads, including their recovery tasks, operational states, timing information and precedence relationships. Second, we formulate the data recovery scheduling problem as an optimization problem, where the goal is to find the schedule that minimizes the financial penalties due to downtime, data loss and vulnerability to subsequent failures. Third, we present several methods for finding optimal or near-optimal solutions, including priority-based, randomized and genetic algorithm-guided ad hoc heuristics. We quantitatively evaluate these methods using realistic storage system designs and workloads, and compare the quality of the algorithms' solutions to optimal solutions provided by a math programming formulation and to the solutions from a simple heuristic that emulates the choices made by human administrators. We find that our heuristics' solutions improve on the administrator heuristic's solutions, often approaching or achieving optimality.