Caju: a content distribution system for edge networks

Authors:
Guthemberg Silvestre;Sébastien Monnet;Ruby Krishnaswamy;Pierre Sens
Affiliations:
LIP6/UPMC/CNRS/INRIA, Paris, France, Orange Labs, Issy-les-Moulineaux, France;LIP6/UPMC/CNRS/INRIA, Paris, France;Orange Labs, Issy-les-Moulineaux, France;LIP6/UPMC/CNRS/INRIA, Paris, France
Venue:
Euro-Par'12 Proceedings of the 18th international conference on Parallel processing workshops
Year:
2012

Citing 6
Cited 0

The Google file system

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Ceph: a scalable, high-performance distributed file system

OSDI '06 Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation - Volume 7
An Efficient and Adaptive Decentralized File Replication Algorithm in P2P File Sharing Systems

IEEE Transactions on Parallel and Distributed Systems
A self-organized, fault-tolerant and scalable replication scheme for cloud storage

Proceedings of the 1st ACM symposium on Cloud computing
The Akamai network: a platform for high-performance internet applications

ACM SIGOPS Operating Systems Review
The tube over time: characterizing popularity growth of youtube videos

Proceedings of the fourth ACM international conference on Web search and data mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

More and more, users store their data in the cloud. While the content is then retrieved, the retrieval has to respect quality of service (QoS) constraints. In order to reduce transfer latency, data is replicated. The idea is make data close to users and to take advantage of providers home storage. However to minimize the cost of their platform, cloud providers need to limit the amount of storage usage. This is still more crucial for big contents. This problem is hard, the distribution of the popularity among the stored pieces of data is highly non-uniform: several pieces of data will never be accessed while others may be retrieved thousands of times. Thus, the trade-off between storage usage and QoS of data retrieval has to take into account the data popularity. This paper presents our architecture gathering several storage domains composed of small-sized datacenters and edge devices; and it shows the importance of adapting the replication degree to data popularity. Our simulations, using realistic workloads, show that a simple cache mechanism provides a eight-fold decrease in the number of SLA violations, requires up to 10 times less of storage capacity for replicas, and reduces aggregate bandwidth and number of flows by half.