Latency-aware data partitioning for geo-replicated online social networks

Authors:
Lei Jiao;Tianyin Xu;Jun Li;Xiaoming Fu
Affiliations:
University of Goettingen;U.C. San Diego;University of Oregon;University of Goettingen
Venue:
Proceedings of the Workshop on Posters and Demos Track
Year:
2011

Citing 3
Cited 0

The little engine(s) that could: scaling online social networks

Proceedings of the ACM SIGCOMM 2010 conference
Volley: automated data placement for geo-distributed cloud services

NSDI'10 Proceedings of the 7th USENIX conference on Networked systems design and implementation
Multilevel algorithms for partitioning power-law graphs

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Large-Scale Online Social Networks (OSNs) usually employ data replication across multiple datacenters in multiple geo-locations to ensure high availability and performance [1]. The de facto method for data replication in current OSNs (e.g., Facebook) is full replication which enables each geo-distributed datacenter to maintain one copy of all the data. The full replication method can simply achieve good performance but poses high overhead for maintenance (e.g., replica storage and synchronization). Firstly, full replication leads to linear storage growth with the increasing of datacenter deployment, which is of poor scalability. Secondly, the data replicas across all the locations requires synchronization, resulting in large inter-datacenter WAN traffic which is very expensive. The ideal solution is to partition user data across multiple datacenters, making each geo-distributed datacenter to maintain one partition of the whole data set. Unfortunately, partitioning OSN data by tradition graph algorithms is known to be very difficult due to the high interconnection and inter-dependency within the OSN data [2]. Besides, geo-partitioning goes beyond the traditional graph partitioning problems because the user-perceived latency is a critical Quality-of-Service (QoS) issue to be considered.