The little engine(s) that could: scaling online social networks

  • Authors:
  • Josep M. Pujol;Vijay Erramilli;Georgos Siganos;Xiaoyuan Yang;Nikolaos Laoutaris;Parminder Chhabra;Pablo Rodriguez

  • Affiliations:
  • 3scale, Barcelona, Spain and Telefonica Research, Barcelona, Spain;Telefonica Research, Barcelona, Spain;Telefonica Research, Barcelona, Spain;Telefonica Research, Barcelona, Spain;Telefonica Research, Barcelona, Spain;Telefonica Research, Barcelona, Spain;Telefonica Research, Barcelona, Spain

  • Venue:
  • IEEE/ACM Transactions on Networking (TON)
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

The difficulty of partitioning social graphs has introduced new system design challenges for scaling of online social networks (OSNs). Vertical scaling by resorting to full replication can be a costly proposition. Scaling horizontally by partitioning and distributing data among multiple servers using, for e.g., distributed hash tables (DHTs), can suffer from expensive interserver communication. Such challenges have often caused costly rearchitecting efforts for popular OSNs like Twitter and Facebook. We design, implement, and evaluate SPAR, a Social Partitioning and Replication middleware that mediates transparently between the application and the database layer of an OSN. SPAR leverages the underlying social graph structure in order to minimize the required replication overhead for ensuring that users have their neighbors' data colocated in the same machine. The gains from this aremultifold: Application developers can assume local semantics, i.e., develop as they would for a single machine; scalability is achieved by adding commodity machines with low memory and network I/O requirements; and N+K redundancy is achieved at a fraction of the cost. We provide a complete system design, extensive evaluation based on datasets from Twitter, Orkut, and Facebook, and a working implementation. We show that SPAR incurs minimum overhead, can help a well-known Twitter clone reach Twitter's scale without changing a line of its application logic, and achieves higher throughput than Cassandra, a popular key-value store database.