Quick Estimation of Data Compression and De-duplication for Large Storage Systems

Authors:
Cornel Constantinescu;Maohua Lu
Affiliations:
-;-
Venue:
CCP '11 Proceedings of the 2011 First International Conference on Data Compression, Communications and Processing
Year:
2011

Citing 0
Cited 2

Insights for data reduction in primary storage: a practical analysis

Proceedings of the 5th Annual International Systems and Storage Conference
To Zip or not to Zip: effective resource usage for real-time compression

FAST'13 Proceedings of the 11th USENIX conference on File and Storage Technologies

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many new storage systems provide some form of data reduction. In a recent paper we investigate how compression and de-duplication can be mixed in primary storage systems serving active data. In this paper we try to answer the question someone would ask before upgrading to a new, data reduction enabled storage server: how much storage savings the new system would offer for the data I have stored right now? We investigate methods to quickly estimate the storage savings potential of customary data reduction methods used in storage systems: compression and full file de-duplication on large scale storage systems. We show that the compression ratio achievable on a large storage system can be precisely estimated with just couple percents (worst case) of the work required to compress each file in the system. Also, we show that full file duplicates can be discovered very quickly with only 4% error (worst case) by a robust heuristic.