The spatial complexity of oblivious k-probe Hash functions
SIAM Journal on Computing
The design and implementation of a log-structured file system
SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
Theoretical Computer Science
Space/time trade-offs in hash coding with allowable errors
Communications of the ACM
Efficient Minimal Perfect Hashing in Nearly Minimal Space
STACS '01 Proceedings of the 18th Annual Symposium on Theoretical Aspects of Computer Science
Farsite: federated, available, and reliable storage for an incompletely trusted environment
ACM SIGOPS Operating Systems Review - OSDI '02: Proceedings of the 5th symposium on Operating systems design and implementation
Remembrance of Data Passed: A Study of Disk Sanitization Practices
IEEE Security and Privacy
The Bloomier filter: an efficient data structure for static support lookup tables
SODA '04 Proceedings of the fifteenth annual ACM-SIAM symposium on Discrete algorithms
Finding similar files in a large file system
WTEC'94 Proceedings of the USENIX Winter 1994 Technical Conference on USENIX Winter 1994 Technical Conference
Secure deletion of data from magnetic and solid-state memory
SSYM'96 Proceedings of the 6th conference on USENIX Security Symposium, Focusing on Applications of Cryptography - Volume 6
Avoiding the disk bottleneck in the data domain deduplication file system
FAST'08 Proceedings of the 6th USENIX Conference on File and Storage Technologies
Succinct Data Structures for Retrieval and Approximate Membership (Extended Abstract)
ICALP '08 Proceedings of the 35th international colloquium on Automata, Languages and Programming, Part I
Proceedings of the 4th ACM international workshop on Storage security and survivability
Demystifying data deduplication
Proceedings of the ACM/IFIP/USENIX Middleware '08 Conference Companion
Sparse indexing: large scale, inline deduplication using sampling and locality
FAST '09 Proccedings of the 7th conference on File and storage technologies
HYDRAstor: a Scalable Secondary Storage
FAST '09 Proccedings of the 7th conference on File and storage technologies
Simple compression code supporting random access and fast string matching
WEA'07 Proceedings of the 6th international conference on Experimental algorithms
Broadword implementation of rank/select queries
WEA'08 Proceedings of the 7th international conference on Experimental algorithms
Decentralized deduplication in SAN cluster file systems
USENIX'09 Proceedings of the 2009 conference on USENIX Annual technical conference
MAD2: A scalable high-throughput exact deduplication approach for network backup services
MSST '10 Proceedings of the 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST)
Reliably erasing data from flash-based solid state drives
FAST'11 Proceedings of the 9th USENIX conference on File and stroage technologies
Minimal perfect hashing: A competitive method for indexing internal memory
Information Sciences: an International Journal
Building a high-performance deduplication system
USENIXATC'11 Proceedings of the 2011 USENIX conference on USENIX annual technical conference
Cores of random r-partite hypergraphs
Information Processing Letters
Characteristics of backup workloads in production systems
FAST'12 Proceedings of the 10th USENIX conference on File and Storage Technologies
iDedup: latency-aware, inline data deduplication for primary storage
FAST'12 Proceedings of the 10th USENIX conference on File and Storage Technologies
Don't thrash: how to cache your hash on flash
Proceedings of the VLDB Endowment
Practical perfect hashing in nearly optimal space
Information Systems
Hi-index | 0.00 |
Sanitization is the process of securely erasing sensitive data from a storage system, effectively restoring the system to a state as if the sensitive data had never been stored. Depending on the threat model, sanitization could require erasing all unreferenced blocks. This is particularly challenging in deduplicated storage systems because each piece of data on the physical media could be referred to by multiple namespace objects. For large storage systems, where available memory is a small fraction of storage capacity, standard techniques for tracking data references will not fit in memory, and we discuss multiple sanitization techniques that trade-off I/O and memory requirements. We have three key contributions. First, we provide an understanding of the threat model and what is required to sanitize a deduplicated storage system as compared to a device. Second, we have designed a memory efficient algorithm using perfect hashing that only requires from 2.54 to 2.87 bits per reference (98% savings) while minimizing the amount of I/O. Third, we present a complete sanitization design for EMC Data Domain.