SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Vivaldi: a decentralized network coordinate system
Proceedings of the 2004 conference on Applications, technologies, architectures, and protocols for computer communications
Static and adaptive distributed data replication using genetic algorithms
Journal of Parallel and Distributed Computing
QoS-Aware Replica Placement for Content Distribution
IEEE Transactions on Parallel and Distributed Systems
Dynamo: amazon's highly available key-value store
Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
A framework for clustering evolving data streams
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Comparison and analysis of ten static heuristics-based Internet data replication techniques
Journal of Parallel and Distributed Computing
Cassandra: a decentralized structured storage system
ACM SIGOPS Operating Systems Review
Volley: automated data placement for geo-distributed cloud services
NSDI'10 Proceedings of the 7th USENIX conference on Networked systems design and implementation
Network coordinates in the wild
NSDI'07 Proceedings of the 4th USENIX conference on Networked systems design & implementation
Towards Optimal Data Replication Across Data Centers
ICDCSW '11 Proceedings of the 2011 31st International Conference on Distributed Computing Systems Workshops
Windows Azure Storage: a highly available cloud storage service with strong consistency
SOSP '11 Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles
S-CLONE: Socially-aware data replication for social networks
Computer Networks: The International Journal of Computer and Telecommunications Networking
A Latency-Aware Co-deployment Mechanism for Cloud-Based Services
CLOUD '12 Proceedings of the 2012 IEEE Fifth International Conference on Cloud Computing
Hi-index | 0.00 |
Cross datacenter data replication has been widely used in geo-cloud environment due to its ability to increase application's availability and improve the performance. However, with the large scale of cloud, it is difficult to determine the location of replicas among datacenters in order to minimize overall user access latency. The data correlation between each other makes replica placement problem more complex. To address these large scale and data correlation issues, we propose a two-step approach called GCplace. Before applying GCplace, a network coordinate system is used to predict the latency between all users and datacenter nodes. In the first step of GCplace, we introduce a stream based similarity clustering, which uses a small number of micro clusters to represent huge number of users and thus significantly reducing the cost of replica placement algorithm. In the second step, an iterative algorithm is proposed to get an approximation solution. We evaluated our approach by using a large scale real network latency dataset. Comprehensive experiments show that GCplace can reduce average user access latency significantly.