Implementing fault-tolerant services using the state machine approach: a tutorial
ACM Computing Surveys (CSUR)
Query optimization for parallel execution
SIGMOD '92 Proceedings of the 1992 ACM SIGMOD international conference on Management of data
Efficient and accurate cost models for parallel query optimization (extended abstract)
PODS '96 Proceedings of the fifteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Efficient resumption of interrupted warehouse loads
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
A survey of rollback-recovery protocols in message-passing systems
ACM Computing Surveys (CSUR)
Informix under CONTROL: Online Query Processing
Data Mining and Knowledge Discovery
Benchmarking the DBS3 Parallel Query Optimizer
IEEE Parallel & Distributed Technology: Systems & Technology
Checkpointing Memory-Resident Databases
Proceedings of the Fifth International Conference on Data Engineering
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Highly available, fault-tolerant, parallel dataflows
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
High-Availability Algorithms for Distributed Stream Processing
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Fault-tolerance in the Borealis distributed stream processing system
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
SIREN: A Memory-Conserving, Snapshot-Consistent Checkpoint Algorithm for in-Memory Databases
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Experiences with MapReduce, an abstraction for large-scale computation
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Dryad: distributed data-parallel programs from sequential building blocks
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Stop-and-restart style execution for long running decision support queries
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Pig latin: a not-so-foreign language for data processing
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Fault-tolerant stream processing using a distributed, replicated file system
Proceedings of the VLDB Endowment
Dependability, Abstraction, and Programming
DASFAA '09 Proceedings of the 14th International Conference on Database Systems for Advanced Applications
A comparison of approaches to large-scale data analysis
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
An evaluation of checkpoint recovery for massively multiplayer online games
Proceedings of the VLDB Endowment
Stateful bulk processing for incremental analytics
Proceedings of the 1st ACM symposium on Cloud computing
Making cloud intermediate data fault-tolerant
Proceedings of the 1st ACM symposium on Cloud computing
NSDI'10 Proceedings of the 7th USENIX conference on Networked systems design and implementation
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Dynamic routing of data stream tuples among parallel query plan running on multi-core processors
Distributed and Parallel Databases
SkewTune: mitigating skew in mapreduce applications
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Stubby: a transformation-based optimizer for MapReduce workflows
Proceedings of the VLDB Endowment
Designing good algorithms for MapReduce and beyond
Proceedings of the Third ACM Symposium on Cloud Computing
Big data begets big database theory
BNCOD'13 Proceedings of the 29th British National conference on Big Data
Scalable progressive analytics on big data in the cloud
Proceedings of the VLDB Endowment
Hi-index | 0.00 |
We address the problem of making online, parallel query plans fault-tolerant: i.e., provide intra-query fault-tolerance without blocking. We develop an approach that not only achieves this goal but does so through the use of different fault-tolerance techniques at different operators within a query plan. Enabling each operator to use a different fault-tolerance strategy leads to a space of fault-tolerance plans amenable to cost-based optimization. We develop FTOpt, a cost-based fault-tolerance optimizer that automatically selects the best strategy for each operator in a query plan in a manner that minimizes the expected processing time with failures for the entire query. We implement our approach in a prototype parallel query-processing engine. Our experiments demonstrate that (1) there is no single best fault-tolerance strategy for all query plans, (2) often hybrid strategies that mix-and-match recovery techniques outperform any uniform strategy, and (3) our optimizer correctly identifies winning fault-tolerance configurations.