Randomized algorithms
PODC '97 Proceedings of the sixteenth annual ACM symposium on Principles of distributed computing
A large-scale study of file-system contents
SIGMETRICS '99 Proceedings of the 1999 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Chord: A scalable peer-to-peer lookup service for internet applications
Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications
A scalable content-addressable network
Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications
Distributed Algorithms
Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems
Middleware '01 Proceedings of the IFIP/ACM International Conference on Distributed Systems Platforms Heidelberg
Towards an Archival Intermemory
ADL '98 Proceedings of the Advances in Digital Libraries Conference
Reclaiming Space from Duplicate Files in a Serverless Distributed File System
ICDCS '02 Proceedings of the 22 nd International Conference on Distributed Computing Systems (ICDCS'02)
Tapestry: An Infrastructure for Fault-tolerant Wide-area Location and
Tapestry: An Infrastructure for Fault-tolerant Wide-area Location and
Samsara: honor among thieves in peer-to-peer storage
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Estimating network size from local information
Information Processing Letters
Percolation Search in Power Law Networks: Making Unstructured Peer-to-Peer Networks Scalable
P2P '04 Proceedings of the Fourth International Conference on Peer-to-Peer Computing
Fastest Mixing Markov Chain on a Graph
SIAM Review
Farsite: federated, available, and reliable storage for an incompletely trusted environment
OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
Taming aggressive replication in the Pangaea wide-area file system
OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
Pastiche: making backup cheap and easy
OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
Search with Probabilistic Guarantees in Unstructured Peer-to-Peer Networks
P2P '05 Proceedings of the Fifth IEEE International Conference on Peer-to-Peer Computing
Random walks in peer-to-peer networks: algorithms and evaluation
Performance Evaluation - P2P computing systems
A cooperative internet backup scheme
ATEC '03 Proceedings of the annual conference on USENIX Annual Technical Conference
Redundancy elimination within large collections of files
ATEC '04 Proceedings of the annual conference on USENIX Annual Technical Conference
Alternatives for detecting redundancy in storage systems data
ATEC '04 Proceedings of the annual conference on USENIX Annual Technical Conference
Single instance storage in Windows® 2000
WSS'00 Proceedings of the 4th conference on USENIX Windows Systems Symposium - Volume 4
Maintaining replicas in unstructured P2P systems
CoNEXT '08 Proceedings of the 2008 ACM CoNEXT Conference
SAFE: A Source Deduplication Framework for Efficient Cloud Backup Services
Journal of Signal Processing Systems
Hi-index | 0.00 |
Distributed peer-to-peer systems rely on voluntary participation of peers to effectively manage a storage pool. In such systems, data is generally replicated for performance and availability. If the storage associated with replication is not monitored and provisioned, the underlying benefits may not be realized. Resource constraints, performance scalability, and availability present diverse considerations. Availability and performance scalability, in terms of response time, are improved by aggressive replication, whereas resource constraints limit total storage in the network. Identification and elimination of redundant data pose fundamental problems for such systems. In this paper, we present a novel and efficient solution that addresses availability and scalability with respect to management of redundant data. Specifically, we address the problem of duplicate elimination in the context of systems connected over an unstructured peer-to-peer network in which there is no a priori binding between an object and its location. We propose two randomized protocols to solve this problem in a scalable and decentralized fashion that does not compromise the availability requirements of the application. Performance results using both large-scale simulations and a prototype built on PlanetLab demonstrate that our protocols provide high probabilistic guarantees while incurring minimal administrative overheads.