MillWheel: fault-tolerant stream processing at internet scale

Authors:
Tyler Akidau;Alex Balikov;Kaya Bekiroğlu;Slava Chernyak;Josh Haberman;Reuven Lax;Sam McVeety;Daniel Mills;Paul Nordstrom;Sam Whittle
Affiliations:
Google;Google;Google;Google;Google;Google;Google;Google;Google;Google
Venue:
Proceedings of the VLDB Endowment
Year:
2013

Citing 24
Cited 3

Virtual time

ACM Transactions on Programming Languages and Systems (TOPLAS)
Time, clocks, and the ordering of events in a distributed system

Communications of the ACM
Models and issues in data stream systems

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Gigascope: high performance network monitoring with an SQL interface

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Exploiting Punctuation Semantics in Continuous Data Streams

IEEE Transactions on Knowledge and Data Engineering
TelegraphCQ: continuous dataflow processing

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Fjording the Stream: An Architecture for Queries Over Streaming Sensor Data

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Aurora: a new model and architecture for data stream management

The VLDB Journal — The International Journal on Very Large Data Bases
High-Availability Algorithms for Distributed Stream Processing

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Flexible time management in data stream systems

PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
A heartbeat mechanism and its application in gigascope

VLDB '05 Proceedings of the 31st international conference on Very large data bases
The 8 requirements of real-time stream processing

ACM SIGMOD Record
MapReduce: simplified data processing on large clusters

Communications of the ACM - 50th anniversary issue: 1958 - 2008
Bigtable: A Distributed Storage System for Structured Data

ACM Transactions on Computer Systems (TOCS)
Out-of-order processing: a new architecture for high-performance stream systems

Proceedings of the VLDB Endowment
Fault-tolerant stream processing using a distributed, replicated file system

Proceedings of the VLDB Endowment
Stateful bulk processing for incremental analytics

Proceedings of the 1st ACM symposium on Cloud computing
Centrifuge: integrated lease management and partitioning for cloud services

NSDI'10 Proceedings of the 7th USENIX conference on Networked systems design and implementation
Large-scale incremental processing using distributed transactions and notifications

OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
S4: Distributed Stream Computing Platform

ICDMW '10 Proceedings of the 2010 IEEE International Conference on Data Mining Workshops
CIEL: a universal execution engine for distributed data-flow computing

Proceedings of the 8th USENIX conference on Networked systems design and implementation
Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing

NSDI'12 Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation
Discretized streams: an efficient and fault-tolerant model for stream processing on large clusters

HotCloud'12 Proceedings of the 4th USENIX conference on Hot Topics in Cloud Ccomputing
Spanner: Google's globally-distributed database

OSDI'12 Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation

Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles

ACM SIGOPS 24th Symposium on Operating Systems Principles
Resilient X10: efficient failure-aware programming

Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming
Aggregation and degradation in JetStream: streaming analytics in the wide area

NSDI'14 Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation

Quantified Score

Hi-index	0.00

Visualization

Abstract

MillWheel is a framework for building low-latency data-processing applications that is widely used at Google. Users specify a directed computation graph and application code for individual nodes, and the system manages persistent state and the continuous flow of records, all within the envelope of the framework's fault-tolerance guarantees. This paper describes MillWheel's programming model as well as its implementation. The case study of a continuous anomaly detector in use at Google serves to motivate how many of MillWheel's features are used. MillWheel's programming model provides a notion of logical time, making it simple to write time-based aggregations. MillWheel was designed from the outset with fault tolerance and scalability in mind. In practice, we find that MillWheel's unique combination of scalability, fault tolerance, and a versatile programming model lends itself to a wide variety of problems at Google.