Wide-area streaming analytics: distributing the data cube

  • Authors:
  • Benjamin Heintz;Abhishek Chandra;Ramesh K. Sitaraman

  • Affiliations:
  • University of Minnesota;University of Minnesota;University of Massachusetts & Akamai Technologies

  • Venue:
  • Proceedings of the 4th annual Symposium on Cloud Computing
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

To date, much research in data-intensive computing has focused on batch computation. Increasingly, however, it is necessary to derive knowledge from big data streams. As a motivating example, consider a content delivery network (CDN) such as Akamai [4], comprising thousands of servers in hundreds of globally distributed locations. Each of these servers produces a stream of log data, recording for example every user it serves, along with each video stream they access, when they play and pause streams, and more. Each server also records network- and system-level data such as TCP connection statistics. In aggregate, the servers produce billions of lines of log data from over a thousand locations daily.