Memory efficient sanitization of a deduplicated storage system

  • Authors:
  • Fabiano C. Botelho;Philip Shilane;Nitin Garg;Windsor Hsu

  • Affiliations:
  • Backup Recovery Systems Division, EMC Corporation;Backup Recovery Systems Division, EMC Corporation;Backup Recovery Systems Division, EMC Corporation;Backup Recovery Systems Division, EMC Corporation

  • Venue:
  • FAST'13 Proceedings of the 11th USENIX conference on File and Storage Technologies
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Sanitization is the process of securely erasing sensitive data from a storage system, effectively restoring the system to a state as if the sensitive data had never been stored. Depending on the threat model, sanitization could require erasing all unreferenced blocks. This is particularly challenging in deduplicated storage systems because each piece of data on the physical media could be referred to by multiple namespace objects. For large storage systems, where available memory is a small fraction of storage capacity, standard techniques for tracking data references will not fit in memory, and we discuss multiple sanitization techniques that trade-off I/O and memory requirements. We have three key contributions. First, we provide an understanding of the threat model and what is required to sanitize a deduplicated storage system as compared to a device. Second, we have designed a memory efficient algorithm using perfect hashing that only requires from 2.54 to 2.87 bits per reference (98% savings) while minimizing the amount of I/O. Third, we present a complete sanitization design for EMC Data Domain.