File routing middleware for cloud deduplication

  • Authors:
  • Petros Efstathopoulos

  • Affiliations:
  • Symantec Research Labs, Symantec Corporation, Culver City, CA

  • Venue:
  • Proceedings of the 2nd International Workshop on Cloud Computing Platforms
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Deduplication technology is maturing and becoming a standard feature of most storage architectures. Many approaches have been proposed to address the deduplication scalability challenges of privately owned storage infrastructure, but as storage is moving to the cloud, deduplication mechanisms are expected to scale to thousands of storage nodes. Currently available solutions were not designed to handle such large scale, while research and practical experience suggests that aiming for global deduplication among thousands of nodes will, almost certainly, lead to high complexity, reduced performance and reduced reliability. Instead, we propose the idea of performing local deduplication operations within each cloud node, and introduce file similarity metrics to determine which node is the best deduplication host for a particular incoming file. This approach reduces the problem of scalable cloud deduplication to a file routing problem, which we can address using a software layer capable of making the necessary routing decisions. Using the proposed file routing middleware layer the system can achieve three important properties: scale to thousands of nodes, support almost any type of underlying storage node, and make the most of file-level deduplication.