SWEEPER: an efficient disaster recovery point identification mechanism

  • Authors:
  • Akshat Verma;Kaladhar Voruganti;Ramani Routray;Rohit Jain

  • Affiliations:
  • IBM India Research;Network Appliance;IBM Almaden Research;Yahoo India

  • Venue:
  • FAST'08 Proceedings of the 6th USENIX Conference on File and Storage Technologies
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Data corruption is one of the key problems that is on top of the radar screen of most CIOs. Continuous Data Protection (CDP) technologies help enterprises deal with data corruption by maintaining multiple versions of data and facilitating recovery by allowing an administrator restore to an earlier clean version of data. The aim of the recovery process after data corruption is to quickly traverse through the backup copies (old versions), and retrieve a clean copy of data. Currently, data recovery is an ad-hoc, time consuming and frustrating process with sequential brute force approaches, where recovery time is proportional to the number of backup copies examined and the time to check a backup copy for data corruption. In this paper, we present the design and implementation of SWEEPER architecture and backup copy selection algorithms that specifically tackle the problem of quickly and systematically identifying a good recovery point. We monitor various system events and generate checkpoint records that help in quickly identifying a clean backup copy. The SWEEPER methodology dynamically determines the selection algorithm based on user specified recovery time and recovery point objectives, and thus, allows system administrators to perform trade-offs between recovery time and data currentness. We have implemented our solution as part of a popular Storage Resource Manager product and evaluated SWEEPER under many diverse settings. Our study clearly establishes the effectiveness of SWEEPER as a robust strategy to significantly reduce recovery time.