ViDeDup: an application-aware framework for video de-duplication

Authors:
Atul Katiyar;Jon Weissman
Affiliations:
Windows Live, Microsoft Corporation, Redmond, WA;Department of Computer Science and Engineering, University of Minnesota Twin Cities, MN
Venue:
HotStorage'11 Proceedings of the 3rd USENIX conference on Hot topics in storage and file systems
Year:
2011

Citing 5
Cited 1

A low-bandwidth network file system

SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
On the Evolution of Clusters of Near-Duplicate Web Pages

LA-WEB '03 Proceedings of the First Conference on Latin American Web Congress
Video copy detection: a comparative study

Proceedings of the 6th ACM international conference on Image and video retrieval
Avoiding the disk bottleneck in the data domain deduplication file system

FAST'08 Proceedings of the 6th USENIX Conference on File and Storage Technologies
Characterizing datasets for data deduplication in backup applications

IISWC '10 Proceedings of the IEEE International Symposium on Workload Characterization (IISWC'10)

Analyzing compute vs. storage tradeoff for video-aware storage efficiency

HotStorage'12 Proceedings of the 4th USENIX conference on Hot Topics in Storage and File Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Key to the compression-capability of a data deduplication system is the definition of redundancy. Traditionally, two data items are considered redundant if their underlying bit-streams are identical. However, this notion of redundancy is too strict for many applications. For example, for a video storage platform, two videos encoded in different formats would be unique at the system level but redundant at the content level. Intuitively, introducing application-level intelligence in redundancy detection can yield improved data compression. We propose ViDeDup (Video De-Duplication), a novel framework for video de-duplication based on an application-level view of redundancy. The framework goes beyond duplicate data detection to similarity-detection, thereby providing application-level knobs for defining acceptable level of noise during replica detection. Our results show that by trading CPU for storage, a 45% reduction in storage space could be achieved, in comparison to 8% yielded by system level de-duplication for a dataset collected from video sharing sites on the Web. We also present tradeoff analysis for various tunable parameters of the system to optimally tune the system for performance, compression and quality.