Randomized Protocols for Duplicate Elimination in Peer-to-Peer Storage Systems

Authors:
Ronaldo A. Ferreira;Murali K. Ramanathan;Ananth Grama;Suresh Jagannathan
Affiliations:
-;-;-;-
Venue:
IEEE Transactions on Parallel and Distributed Systems
Year:
2007

Citing 23
Cited 2

Randomized algorithms

Randomized algorithms
Probabilistic quorum systems

PODC '97 Proceedings of the sixteenth annual ACM symposium on Principles of distributed computing
A large-scale study of file-system contents

SIGMETRICS '99 Proceedings of the 1999 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Chord: A scalable peer-to-peer lookup service for internet applications

Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications
A scalable content-addressable network

Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications
Distributed Algorithms

Distributed Algorithms
Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems

Middleware '01 Proceedings of the IFIP/ACM International Conference on Distributed Systems Platforms Heidelberg
Towards an Archival Intermemory

ADL '98 Proceedings of the Advances in Digital Libraries Conference
Reclaiming Space from Duplicate Files in a Serverless Distributed File System

ICDCS '02 Proceedings of the 22 nd International Conference on Distributed Computing Systems (ICDCS'02)
Tapestry: An Infrastructure for Fault-tolerant Wide-area Location and

Tapestry: An Infrastructure for Fault-tolerant Wide-area Location and
Samsara: honor among thieves in peer-to-peer storage

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Estimating network size from local information

Information Processing Letters
Percolation Search in Power Law Networks: Making Unstructured Peer-to-Peer Networks Scalable

P2P '04 Proceedings of the Fourth International Conference on Peer-to-Peer Computing
Fastest Mixing Markov Chain on a Graph

SIAM Review
Farsite: federated, available, and reliable storage for an incompletely trusted environment

OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
Taming aggressive replication in the Pangaea wide-area file system

OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
Pastiche: making backup cheap and easy

OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
Search with Probabilistic Guarantees in Unstructured Peer-to-Peer Networks

P2P '05 Proceedings of the Fifth IEEE International Conference on Peer-to-Peer Computing
Random walks in peer-to-peer networks: algorithms and evaluation

Performance Evaluation - P2P computing systems
A cooperative internet backup scheme

ATEC '03 Proceedings of the annual conference on USENIX Annual Technical Conference
Redundancy elimination within large collections of files

ATEC '04 Proceedings of the annual conference on USENIX Annual Technical Conference
Alternatives for detecting redundancy in storage systems data

ATEC '04 Proceedings of the annual conference on USENIX Annual Technical Conference
Single instance storage in Windows® 2000

WSS'00 Proceedings of the 4th conference on USENIX Windows Systems Symposium - Volume 4

Maintaining replicas in unstructured P2P systems

CoNEXT '08 Proceedings of the 2008 ACM CoNEXT Conference
SAFE: A Source Deduplication Framework for Efficient Cloud Backup Services

Journal of Signal Processing Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Distributed peer-to-peer systems rely on voluntary participation of peers to effectively manage a storage pool. In such systems, data is generally replicated for performance and availability. If the storage associated with replication is not monitored and provisioned, the underlying benefits may not be realized. Resource constraints, performance scalability, and availability present diverse considerations. Availability and performance scalability, in terms of response time, are improved by aggressive replication, whereas resource constraints limit total storage in the network. Identification and elimination of redundant data pose fundamental problems for such systems. In this paper, we present a novel and efficient solution that addresses availability and scalability with respect to management of redundant data. Specifically, we address the problem of duplicate elimination in the context of systems connected over an unstructured peer-to-peer network in which there is no a priori binding between an object and its location. We propose two randomized protocols to solve this problem in a scalable and decentralized fashion that does not compromise the availability requirements of the application. Performance results using both large-scale simulations and a prototype built on PlanetLab demonstrate that our protocols provide high probabilistic guarantees while incurring minimal administrative overheads.