Making every bit count in wide-area analytics

Authors:
Ariel Rabkin;Matvey Arye;Siddhartha Sen;Vivek Pai;Michael J. Freedman
Affiliations:
Princeton University;Princeton University;Princeton University;Princeton University;Princeton University
Venue:
HotOS'13 Proceedings of the 14th USENIX conference on Hot Topics in Operating Systems
Year:
2013

Citing 16
Cited 2

Multiple-query optimization

ACM Transactions on Database Systems (TODS)
Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals

Data Mining and Knowledge Discovery
TelegraphCQ: continuous dataflow processing

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
TAG: a Tiny AGgregation service for Ad-Hoc sensor networks

OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
An improved data stream summary: the count-min sketch and its applications

Journal of Algorithms
Stream Cube: An Architecture for Multi-Dimensional Analysis of Data Streams

Distributed and Parallel Databases
Network-Aware Operator Placement for Stream-Processing Systems

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
SPADE: the system s declarative stream processing engine

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
The ORCHESTRA Collaborative Data Sharing System

ACM SIGMOD Record
Fast and Reliable Stream Processing over Wide Area Networks

ICDEW '07 Proceedings of the 2007 IEEE 23rd International Conference on Data Engineering Workshop
Stream warehousing with DataDepot

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Do you know your IQ?: a research agenda for information quality in systems

ACM SIGMETRICS Performance Evaluation Review
Advances and challenges in log analysis

Communications of the ACM
DBToaster: higher-order delta processing for dynamic, frequently fresh views

Proceedings of the VLDB Endowment
REX: recursive, delta-based data-centric computation

Proceedings of the VLDB Endowment
Blink and it's done: interactive queries on very large data

Proceedings of the VLDB Endowment

Wide-area streaming analytics: distributing the data cube

Proceedings of the 4th annual Symposium on Cloud Computing
Aggregation and degradation in JetStream: streaming analytics in the wide area

NSDI'14 Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many data sets, such as system logs, are generated from widely distributed locations. Current distributed systems often discard this data because they lack the ability to backhaul it efficiently, or to do anything meaningful with it at the distributed sites. This leads to lost functionality, efficiency, and business opportunities. The problem with traditional backhaul approaches is that they are slow and costly, and require analysts to define the data they are interested in up-front. We propose a new architecture that stores data at the edge (i.e., near where it is generated) and supports rich real-time and historical queries on this data, while adjusting data quality to cope with the vagaries of wide-area bandwidth. In essence, this design transforms a distributed data collection system into a distributed data analysis system, where decisions about collection do not preclude decisions about analysis.