Dynamo: amazon's highly available key-value store
Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
User interactions in social networks and their implications
Proceedings of the 4th ACM European conference on Computer systems
On the evolution of user interaction in Facebook
Proceedings of the 2nd ACM workshop on Online social networks
Understanding online social network usage from a network perspective
Proceedings of the 9th ACM SIGCOMM conference on Internet measurement conference
Characterizing user behavior in online social networks
Proceedings of the 9th ACM SIGCOMM conference on Internet measurement conference
Cassandra: a decentralized structured storage system
ACM SIGOPS Operating Systems Review
The little engine(s) that could: scaling online social networks
Proceedings of the ACM SIGCOMM 2010 conference
Schism: a workload-driven approach to database replication and partitioning
Proceedings of the VLDB Endowment
Cachet: a decentralized architecture for privacy preserving social networking with caching
Proceedings of the 8th international conference on Emerging networking experiments and technologies
Hi-index | 0.00 |
The most common type of queries in online social networks are news feeds of friends' recent activities. These queries involve the retrieval of multiple small records generated by different users in the network, and the results are time dependent. Hash-based horizontal partitioning of data results in accesses at multiple servers, which significantly affects throughput and response time. Partitioning of social network data is difficult because of the power-law degree distribution of the friendship graph, and the time dependency of queries and user activities. The power-law degree distribution results in a tremendous amount of extra storage for replication-based partitions, and the time dependency makes query-driven partitioning ineffective. We propose to partition not only the spatial network of social relations, but also in the time dimension so that users who have communicated in a given period are grouped together. We build an activity prediction graph to capture relationships with strong activity and serve as the basis for partitioning. New nodes occurring in the current period are added greedily. We test the partitioning results with emulation of Facebook page downloads, and show that our algorithm achieves 10 times better data locality than hash-based horizontal partitioning algorithms. We show the quality of activity prediction by observing that the algorithm with prediction achieves 80% data locality of that with perfect knowledge of the current period.