ACM Transactions on Programming Languages and Systems (TOPLAS)
Time, clocks, and the ordering of events in a distributed system
Communications of the ACM
Models and issues in data stream systems
Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Gigascope: high performance network monitoring with an SQL interface
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Exploiting Punctuation Semantics in Continuous Data Streams
IEEE Transactions on Knowledge and Data Engineering
TelegraphCQ: continuous dataflow processing
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Fjording the Stream: An Architecture for Queries Over Streaming Sensor Data
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Aurora: a new model and architecture for data stream management
The VLDB Journal — The International Journal on Very Large Data Bases
High-Availability Algorithms for Distributed Stream Processing
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Flexible time management in data stream systems
PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
A heartbeat mechanism and its application in gigascope
VLDB '05 Proceedings of the 31st international conference on Very large data bases
The 8 requirements of real-time stream processing
ACM SIGMOD Record
MapReduce: simplified data processing on large clusters
Communications of the ACM - 50th anniversary issue: 1958 - 2008
Bigtable: A Distributed Storage System for Structured Data
ACM Transactions on Computer Systems (TOCS)
Out-of-order processing: a new architecture for high-performance stream systems
Proceedings of the VLDB Endowment
Fault-tolerant stream processing using a distributed, replicated file system
Proceedings of the VLDB Endowment
Stateful bulk processing for incremental analytics
Proceedings of the 1st ACM symposium on Cloud computing
Centrifuge: integrated lease management and partitioning for cloud services
NSDI'10 Proceedings of the 7th USENIX conference on Networked systems design and implementation
Large-scale incremental processing using distributed transactions and notifications
OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
S4: Distributed Stream Computing Platform
ICDMW '10 Proceedings of the 2010 IEEE International Conference on Data Mining Workshops
CIEL: a universal execution engine for distributed data-flow computing
Proceedings of the 8th USENIX conference on Networked systems design and implementation
Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing
NSDI'12 Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation
Discretized streams: an efficient and fault-tolerant model for stream processing on large clusters
HotCloud'12 Proceedings of the 4th USENIX conference on Hot Topics in Cloud Ccomputing
Spanner: Google's globally-distributed database
OSDI'12 Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
ACM SIGOPS 24th Symposium on Operating Systems Principles
Resilient X10: efficient failure-aware programming
Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming
Aggregation and degradation in JetStream: streaming analytics in the wide area
NSDI'14 Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation
Hi-index | 0.00 |
MillWheel is a framework for building low-latency data-processing applications that is widely used at Google. Users specify a directed computation graph and application code for individual nodes, and the system manages persistent state and the continuous flow of records, all within the envelope of the framework's fault-tolerance guarantees. This paper describes MillWheel's programming model as well as its implementation. The case study of a continuous anomaly detector in use at Google serves to motivate how many of MillWheel's features are used. MillWheel's programming model provides a notion of logical time, making it simple to write time-based aggregations. MillWheel was designed from the outset with fault tolerance and scalability in mind. In practice, we find that MillWheel's unique combination of scalability, fault tolerance, and a versatile programming model lends itself to a wide variety of problems at Google.