STREAM: the stanford stream data manager (demonstration description)
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
TelegraphCQ: continuous dataflow processing
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Highly available, fault-tolerant, parallel dataflows
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
High-Availability Algorithms for Distributed Stream Processing
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Fault-tolerance in the borealis distributed stream processing system
ACM Transactions on Database Systems (TODS)
Stateful bulk processing for incremental analytics
Proceedings of the 1st ACM symposium on Cloud computing
Comet: batched stream processing for data intensive distributed computing
Proceedings of the 1st ACM symposium on Cloud computing
Continuous analytics over discontinuous streams
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
NSDI'10 Proceedings of the 7th USENIX conference on Networked systems design and implementation
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Large-scale incremental processing using distributed transactions and notifications
OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
In-situ MapReduce for log processing
USENIXATC'11 Proceedings of the 2011 USENIX conference on USENIX annual technical conference
Fast crash recovery in RAMCloud
SOSP '11 Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles
Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing
NSDI'12 Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation
Muppet: MapReduce-style processing of fast data
Proceedings of the VLDB Endowment
Photon: fault-tolerant and scalable joining of continuous data streams
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Integrating scale out and fault tolerance in stream processing using operator state management
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
TimeStream: reliable stream computation in the cloud
Proceedings of the 8th ACM European Conference on Computer Systems
Exploiting application dynamism and cloud elasticity for continuous dataflows
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
"All roads lead to Rome": optimistic recovery for distributed iterative data processing
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Data-Intensive Cloud Computing: Requirements, Expectations, Challenges, and Solutions
Journal of Grid Computing
MillWheel: fault-tolerant stream processing at internet scale
Proceedings of the VLDB Endowment
Scalable progressive analytics on big data in the cloud
Proceedings of the VLDB Endowment
Semantic-based QoS management in cloud systems: Current status and future challenges
Future Generation Computer Systems
Aggregation and degradation in JetStream: streaming analytics in the wide area
NSDI'14 Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation
Hi-index | 0.00 |
Many important "big data" applications need to process data arriving in real time. However, current programming models for distributed stream processing are relatively low-level, often leaving the user to worry about consistency of state across the system and fault recovery. Furthermore, the models that provide fault recovery do so in an expensive manner, requiring either hot replication or long recovery times. We propose a new programming model, discretized streams (D-Streams), that offers a high-level functional programming API, strong consistency, and efficient fault recovery. D-Streams support a new recovery mechanism that improves efficiency over the traditional replication and upstream backup solutions in streaming databases: parallel recovery of lost state across the cluster. We have prototyped D-Streams in an extension to the Spark cluster computing framework called Spark Streaming, which lets users seamlessly intermix streaming, batch and interactive queries.