Run-time operator state spilling for memory intensive long-running queries
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Design, implementation, and evaluation of the linear road bnchmark on the stream processing core
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
SPC: a distributed, scalable platform for data mining
Proceedings of the 4th international workshop on Data mining standards, services and platforms
Linear road: a stream data management benchmark
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Scalable and near real-time burst detection from eCommerce queries
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
The cost of a cloud: research problems in data center networks
ACM SIGCOMM Computer Communication Review
Elastic scaling of data parallel operators in stream processing
IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
An empirical study of high availability in stream processing systems
Proceedings of the 10th ACM/IFIP/USENIX International Conference on Middleware
A comparison of join algorithms for log processing in MaPreduce
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
A Hybrid Approach to High Availability in Stream Processing Systems
ICDCS '10 Proceedings of the 2010 IEEE 30th International Conference on Distributed Computing Systems
S4: Distributed Stream Computing Platform
ICDMW '10 Proceedings of the 2010 IEEE International Conference on Data Mining Workshops
Big data and cloud computing: current state and future opportunities
Proceedings of the 14th International Conference on Extending Database Technology
Distributed middleware reliability and fault tolerance support in system S
Proceedings of the 5th ACM international conference on Distributed event-based system
Understanding network failures in data centers: measurement, analysis, and implications
Proceedings of the ACM SIGCOMM 2011 conference
IEEE Transactions on Parallel and Distributed Systems
Esc: Towards an Elastic Stream Computing Platform for the Cloud
CLOUD '11 Proceedings of the 2011 IEEE 4th International Conference on Cloud Computing
CEC: Continuous eventual checkpointing for data stream processing operators
DSN '11 Proceedings of the 2011 IEEE/IFIP 41st International Conference on Dependable Systems&Networks
Active Replication at (Almost) No Cost
SRDS '11 Proceedings of the 2011 IEEE 30th International Symposium on Reliable Distributed Systems
Managing parallelism for stream processing in the cloud
Proceedings of the 1st International Workshop on Hot Topics in Cloud Data Processing
Virtualizing stream processing
Middleware'11 Proceedings of the 12th ACM/IFIP/USENIX international conference on Middleware
Partition and compose: parallel complex event processing
Proceedings of the 6th ACM International Conference on Distributed Event-Based Systems
DBToaster: higher-order delta processing for dynamic, frequently fresh views
Proceedings of the VLDB Endowment
Discretized streams: an efficient and fault-tolerant model for stream processing on large clusters
HotCloud'12 Proceedings of the 4th USENIX conference on Hot Topics in Cloud Ccomputing
StreamCloud: An Elastic and Scalable Data Streaming System
IEEE Transactions on Parallel and Distributed Systems
Report on the first workshop on innovative querying of streams
ACM SIGMOD Record
Streamforce: outsourcing access control enforcement for stream data to the clouds
Proceedings of the 4th ACM conference on Data and application security and privacy
Hi-index | 0.00 |
As users of "big data" applications expect fresh results, we witness a new breed of stream processing systems (SPS) that are designed to scale to large numbers of cloud-hosted machines. Such systems face new challenges: (i) to benefit from the "pay-as-you-go" model of cloud computing, they must scale out on demand, acquiring additional virtual machines (VMs) and parallelising operators when the workload increases; (ii) failures are common with deployments on hundreds of VMs-systems must be fault-tolerant with fast recovery times, yet low per-machine overheads. An open question is how to achieve these two goals when stream queries include stateful operators, which must be scaled out and recovered without affecting query results. Our key idea is to expose internal operator state explicitly to the SPS through a set of state management primitives. Based on them, we describe an integrated approach for dynamic scale out and recovery of stateful operators. Externalised operator state is checkpointed periodically by the SPS and backed up to upstream VMs. The SPS identifies individual operator bottlenecks and automatically scales them out by allocating new VMs and partitioning the checkpointed state. At any point, failed operators are recovered by restoring checkpointed state on a new VM and replaying unprocessed tuples. We evaluate this approach with the Linear Road Benchmark on the Amazon EC2 cloud platform and show that it can scale automatically to a load factor of L=350 with 50 VMs, while recovering quickly from failures.