The little engine(s) that could: scaling online social networks

  • Authors:
  • Josep M. Pujol;Vijay Erramilli;Georgos Siganos;Xiaoyuan Yang;Nikos Laoutaris;Parminder Chhabra;Pablo Rodriguez

  • Affiliations:
  • Telefonica Research, Barcelona, Spain;Telefonica Research, Barcelona, Spain;Telefonica Research, Barcelona, Spain;Telefonica Research, Barcelona, Spain;Telefonica Reserach, Barcelona, Spain;Telefonica Reserach, Barcelona, Spain;Telefonica Research, Barcelona, Spain

  • Venue:
  • Proceedings of the ACM SIGCOMM 2010 conference
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

The difficulty of scaling Online Social Networks (OSNs) has introduced new system design challenges that has often caused costly re-architecting for services like Twitter and Facebook. The complexity of interconnection of users in social networks has introduced new scalability challenges. Conventional vertical scaling by resorting to full replication can be a costly proposition. Horizontal scaling by partitioning and distributing data among multiples servers - e.g. using DHTs - can lead to costly inter-server communication. We design, implement, and evaluate SPAR, a social partitioning and replication middle-ware that transparently leverages the social graph structure to achieve data locality while minimizing replication. SPAR guarantees that for all users in an OSN, their direct neighbor's data is co-located in the same server. The gains from this approach are multi-fold: application developers can assume local semantics, i.e., develop as they would for a single server; scalability is achieved by adding commodity servers with low memory and network I/O requirements; and redundancy is achieved at a fraction of the cost. We detail our system design and an evaluation based on datasets from Twitter, Orkut, and Facebook, with a working implementation. We show that SPAR incurs minimum overhead, and can help a well-known open-source Twitter clone reach Twitter's scale without changing a line of its application logic and achieves higher throughput than Cassandra, Facebook's DHT based key-value store database.