Implementing fault-tolerant services using the state machine approach: a tutorial
ACM Computing Surveys (CSUR)
Encapsulation of parallelism in the Volcano query processing system
SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
Implementing recoverable requests using queues
SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
Parallel database systems: the future of high performance database systems
Communications of the ACM
Query evaluation techniques for large databases
ACM Computing Surveys (CSUR)
NiagaraCQ: a scalable continuous query system for Internet databases
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Bro: a system for detecting network intruders in real-time
Computer Networks: The International Journal of Computer and Telecommunications Networking
Byzantine generals in action: implementing fail-stop processors
ACM Transactions on Computer Systems (TOCS)
Proceedings of the twentieth annual ACM symposium on Principles of distributed computing
Phoenix project: fault-tolerant applications
ACM SIGMOD Record
Transaction Processing: Concepts and Techniques
Transaction Processing: Concepts and Techniques
A Case for NOW (Networks of Workstations)
IEEE Micro
Chained Declustering: A New Availability Strategy for Multiprocessor Database Machines
Proceedings of the Sixth International Conference on Data Engineering
Managing Intra-operator Parallelism in Parallel Database Systems
VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
FTCS '98 Proceedings of the The Twenty-Eighth Annual International Symposium on Fault-Tolerant Computing
Stateful Intrusion Detection for High-Speed Networks
SP '02 Proceedings of the 2002 IEEE Symposium on Security and Privacy
ISIS: A System for Fault-Tolerant Distributed Computing
ISIS: A System for Fault-Tolerant Distributed Computing
The Horus and Ensemble Projects: Accomplishments and Limitations
The Horus and Ensemble Projects: Accomplishments and Limitations
A Stateful Intrusion Detection System for World-Wide Web Servers
ACSAC '03 Proceedings of the 19th Annual Computer Security Applications Conference
Load management and high availability in the Medusa distributed stream processing system
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
High-Availability Algorithms for Distributed Stream Processing
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Fault-tolerance in the Borealis distributed stream processing system
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Parallel querying with non-dedicated computers
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Customizable parallel execution of scientific stream queries
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Towards correcting input data errors probabilistically using integrity constraints
MobiDE '06 Proceedings of the 5th ACM international workshop on Data engineering for wireless and mobile access
Dryad: distributed data-parallel programs from sequential building blocks
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Staying FIT: efficient load shedding techniques for distributed stream processing
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Fault-tolerance in the borealis distributed stream processing system
ACM Transactions on Database Systems (TODS)
Foundations and Trends in Databases
Borealis-R: a replication-transparent stream processing system for wide-area monitoring applications
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Replica placement for high availability in distributed stream processing systems
Proceedings of the second international conference on Distributed event-based systems
Wide-scale data stream management
ATC'08 USENIX 2008 Annual Technical Conference on Annual Technical Conference
Challenges in dependable internet-scale stream processing
Proceedings of the 2nd workshop on Dependable distributed data management
Fault-tolerant stream processing using a distributed, replicated file system
Proceedings of the VLDB Endowment
Adaptive workload allocation in query processing in autonomous heterogeneous environments
Distributed and Parallel Databases
Autonomic query parallelization using non-dedicated computers: an evaluation of adaptivity options
The VLDB Journal — The International Journal on Very Large Data Bases
A Vision for Next Generation Query Processors and an Associated Research Agenda
Globe '09 Proceedings of the 2nd International Conference on Data Management in Grid and Peer-to-Peer Systems
Distributed event stream processing with non-deterministic finite automata
Proceedings of the Third ACM International Conference on Distributed Event-Based Systems
An empirical study of high availability in stream processing systems
Proceedings of the 10th ACM/IFIP/USENIX International Conference on Middleware
Integration of reliable sensor data stream management into digital libraries
DELOS'07 Proceedings of the 1st international conference on Digital libraries: research and development
Continuous analytics over discontinuous streams
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Detouring and replication for fast and reliable internet-scale stream processing
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
NSDI'10 Proceedings of the 7th USENIX conference on Networked systems design and implementation
Proceedings of the VLDB Endowment
Reliable distributed data stream management in mobile environments
Information Systems
A latency and fault-tolerance optimizer for online parallel query plans
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Efficient and coordinated checkpointing for reliable distributed data stream management
ADBIS'06 Proceedings of the 10th East European conference on Advances in Databases and Information Systems
Processing flows of information: From data stream to complex event processing
ACM Computing Surveys (CSUR)
SkewTune: mitigating skew in mapreduce applications
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Deriving a unified fault taxonomy for event-based systems
Proceedings of the 6th ACM International Conference on Distributed Event-Based Systems
Partition and compose: parallel complex event processing
Proceedings of the 6th ACM International Conference on Distributed Event-Based Systems
Discretized streams: an efficient and fault-tolerant model for stream processing on large clusters
HotCloud'12 Proceedings of the 4th USENIX conference on Hot Topics in Cloud Ccomputing
Failover and takeover contingency mechanisms for network partition and node failure
Proceedings of the eleventh ACM SIGPLAN workshop on Erlang workshop
Declarative distributed advertisement system for iDTV: an industrial experience
Proceedings of the 14th symposium on Principles and practice of declarative programming
Auto-parallelizing stateful distributed streaming applications
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Pollux: towards scalable distributed real-time search on microblogs
Proceedings of the 16th International Conference on Extending Database Technology
TimeStream: reliable stream computation in the cloud
Proceedings of the 8th ACM European Conference on Computer Systems
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
ACM SIGOPS 24th Symposium on Operating Systems Principles
Discretized streams: fault-tolerant streaming computation at scale
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
A catalog of stream processing optimizations
ACM Computing Surveys (CSUR)
Hi-index | 0.01 |
We present a technique that masks failures in a cluster to provide high availability and fault-tolerance for long-running, parallelized dataflows. We can use these dataflows to implement a variety of continuous query (CQ) applications that require high-throughput, 24x7 operation. Examples include network monitoring, phone call processing, click-stream processing, and online financial analysis. Our main contribution is a scheme that carefully integrates traditional query processing techniques for partitioned parallelism with the process-pairs approach for high availability. This delicate integration allows us to tolerate failures of portions of a parallel dataflow without sacrificing result quality. Upon failure, our technique provides quick fail-over, and automatically recovers the lost pieces on the fly. This piecemeal recovery provides minimal disruption to the ongoing dataflow computation and improved reliability as compared to the straight-forward application of the process-pairs technique on a per dataflow basis. Thus, our technique provides the high availability necessary for critical CQ applications. Our techniques are encapsulated in a reusable dataflow operator called Flux, an extension of the Exchange that is used to compose parallel dataflows. Encapsulating the fault-tolerance logic into Flux minimizes modifications to existing operator code and relieves the burden on the operator writer of repeatedly implementing and verifying this critical logic. We present experiments illustrating these features with an implementation of Flux in the TelegraphCQ code base [8].