Scuba: diving into data at facebook

  • Authors:
  • Lior Abraham;John Allen;Oleksandr Barykin;Vinayak Borkar;Bhuwan Chopra;Ciprian Gerea;Daniel Merl;Josh Metzler;David Reiss;Subbu Subramanian;Janet L. Wiener;Okay Zed

  • Affiliations:
  • Facebook, Inc. Menlo Park, CA;Facebook, Inc. Menlo Park, CA;Facebook, Inc. Menlo Park, CA;Facebook, Inc. Menlo Park, CA;Facebook, Inc. Menlo Park, CA;Facebook, Inc. Menlo Park, CA;Facebook, Inc. Menlo Park, CA;Facebook, Inc. Menlo Park, CA;Facebook, Inc. Menlo Park, CA;Facebook, Inc. Menlo Park, CA;Facebook, Inc. Menlo Park, CA;Facebook, Inc. Menlo Park, CA

  • Venue:
  • Proceedings of the VLDB Endowment
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Facebook takes performance monitoring seriously. Performance issues can impact over one billion users so we track thousands of servers, hundreds of PB of daily network traffic, hundreds of daily code changes, and many other metrics. We require latencies of under a minute from events occuring (a client request on a phone, a bug report filed, a code change checked in) to graphs showing those events on developers' monitors. Scuba is the data management system Facebook uses for most real-time analysis. Scuba is a fast, scalable, distributed, in-memory database built at Facebook. It currently ingests millions of rows (events) per second and expires data at the same rate. Scuba stores data completely in memory on hundreds of servers each with 144 GB RAM. To process each query, Scuba aggregates data from all servers. Scuba processes almost a million queries per day. Scuba is used extensively for interactive, ad hoc, analysis queries that run in under a second over live data. In addition, Scuba is the workhorse behind Facebook's code regression analysis, bug report monitoring, ads revenue monitoring, and performance debugging.