Memory efficient sanitization of a deduplicated storage system

Authors:
Fabiano C. Botelho;Philip Shilane;Nitin Garg;Windsor Hsu
Affiliations:
Backup Recovery Systems Division, EMC Corporation;Backup Recovery Systems Division, EMC Corporation;Backup Recovery Systems Division, EMC Corporation;Backup Recovery Systems Division, EMC Corporation
Venue:
FAST'13 Proceedings of the 11th USENIX conference on File and Storage Technologies
Year:
2013

Citing 28
Cited 0

The spatial complexity of oblivious k-probe Hash functions

SIAM Journal on Computing
The design and implementation of a log-structured file system

SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
Perfect hashing

Theoretical Computer Science
Space/time trade-offs in hash coding with allowable errors

Communications of the ACM
Efficient Minimal Perfect Hashing in Nearly Minimal Space

STACS '01 Proceedings of the 18th Annual Symposium on Theoretical Aspects of Computer Science
Farsite: federated, available, and reliable storage for an incompletely trusted environment

ACM SIGOPS Operating Systems Review - OSDI '02: Proceedings of the 5th symposium on Operating systems design and implementation
Remembrance of Data Passed: A Study of Disk Sanitization Practices

IEEE Security and Privacy
The Bloomier filter: an efficient data structure for static support lookup tables

SODA '04 Proceedings of the fifteenth annual ACM-SIAM symposium on Discrete algorithms
Finding similar files in a large file system

WTEC'94 Proceedings of the USENIX Winter 1994 Technical Conference on USENIX Winter 1994 Technical Conference
Secure deletion of data from magnetic and solid-state memory

SSYM'96 Proceedings of the 6th conference on USENIX Security Symposium, Focusing on Applications of Cryptography - Volume 6
Avoiding the disk bottleneck in the data domain deduplication file system

FAST'08 Proceedings of the 6th USENIX Conference on File and Storage Technologies
Succinct Data Structures for Retrieval and Approximate Membership (Extended Abstract)

ICALP '08 Proceedings of the 35th international colloquium on Automata, Languages and Programming, Part I
Secure data deduplication

Proceedings of the 4th ACM international workshop on Storage security and survivability
Demystifying data deduplication

Proceedings of the ACM/IFIP/USENIX Middleware '08 Conference Companion
Sparse indexing: large scale, inline deduplication using sampling and locality

FAST '09 Proccedings of the 7th conference on File and storage technologies
HYDRAstor: a Scalable Secondary Storage

FAST '09 Proccedings of the 7th conference on File and storage technologies
Simple compression code supporting random access and fast string matching

WEA'07 Proceedings of the 6th international conference on Experimental algorithms
Broadword implementation of rank/select queries

WEA'08 Proceedings of the 7th international conference on Experimental algorithms
Decentralized deduplication in SAN cluster file systems

USENIX'09 Proceedings of the 2009 conference on USENIX Annual technical conference
MAD2: A scalable high-throughput exact deduplication approach for network backup services

MSST '10 Proceedings of the 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST)
Reliably erasing data from flash-based solid state drives

FAST'11 Proceedings of the 9th USENIX conference on File and stroage technologies
Minimal perfect hashing: A competitive method for indexing internal memory

Information Sciences: an International Journal
Building a high-performance deduplication system

USENIXATC'11 Proceedings of the 2011 USENIX conference on USENIX annual technical conference
Cores of random r-partite hypergraphs

Information Processing Letters
Characteristics of backup workloads in production systems

FAST'12 Proceedings of the 10th USENIX conference on File and Storage Technologies
iDedup: latency-aware, inline data deduplication for primary storage

FAST'12 Proceedings of the 10th USENIX conference on File and Storage Technologies
Don't thrash: how to cache your hash on flash

Proceedings of the VLDB Endowment
Practical perfect hashing in nearly optimal space

Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Sanitization is the process of securely erasing sensitive data from a storage system, effectively restoring the system to a state as if the sensitive data had never been stored. Depending on the threat model, sanitization could require erasing all unreferenced blocks. This is particularly challenging in deduplicated storage systems because each piece of data on the physical media could be referred to by multiple namespace objects. For large storage systems, where available memory is a small fraction of storage capacity, standard techniques for tracking data references will not fit in memory, and we discuss multiple sanitization techniques that trade-off I/O and memory requirements. We have three key contributions. First, we provide an understanding of the threat model and what is required to sanitize a deduplicated storage system as compared to a device. Second, we have designed a memory efficient algorithm using perfect hashing that only requires from 2.54 to 2.87 bits per reference (98% savings) while minimizing the amount of I/O. Third, we present a complete sanitization design for EMC Data Domain.