A latency and fault-tolerance optimizer for online parallel query plans

Authors:
Prasang Upadhyaya;YongChul Kwon;Magdalena Balazinska
Affiliations:
University of Washington, Seattle, WA, USA;University of Washington, Seattle, WA, USA;University of Washington, Seattle, WA, USA
Venue:
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Year:
2011

Citing 28
Cited 6

Implementing fault-tolerant services using the state machine approach: a tutorial

ACM Computing Surveys (CSUR)
Query optimization for parallel execution

SIGMOD '92 Proceedings of the 1992 ACM SIGMOD international conference on Management of data
Efficient and accurate cost models for parallel query optimization (extended abstract)

PODS '96 Proceedings of the fifteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Online aggregation

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Efficient resumption of interrupted warehouse loads

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
A survey of rollback-recovery protocols in message-passing systems

ACM Computing Surveys (CSUR)
Informix under CONTROL: Online Query Processing

Data Mining and Knowledge Discovery
Benchmarking the DBS3 Parallel Query Optimizer

IEEE Parallel & Distributed Technology: Systems & Technology
Checkpointing Memory-Resident Databases

Proceedings of the Fifth International Conference on Data Engineering
Optimization Algorithms for Exploiting the Parallelism-Communication Tradeoff in Pipelined Parallelism

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Highly available, fault-tolerant, parallel dataflows

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
High-Availability Algorithms for Distributed Stream Processing

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Fault-tolerance in the Borealis distributed stream processing system

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
SIREN: A Memory-Conserving, Snapshot-Consistent Checkpoint Algorithm for in-Memory Databases

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Experiences with MapReduce, an abstraction for large-scale computation

Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Query suspend and resume

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Dryad: distributed data-parallel programs from sequential building blocks

Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Stop-and-restart style execution for long running decision support queries

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Pig latin: a not-so-foreign language for data processing

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Fault-tolerant stream processing using a distributed, replicated file system

Proceedings of the VLDB Endowment
Dependability, Abstraction, and Programming

DASFAA '09 Proceedings of the 14th International Conference on Database Systems for Advanced Applications
A comparison of approaches to large-scale data analysis

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
An evaluation of checkpoint recovery for massively multiplayer online games

Proceedings of the VLDB Endowment
Stateful bulk processing for incremental analytics

Proceedings of the 1st ACM symposium on Cloud computing
Making cloud intermediate data fault-tolerant

Proceedings of the 1st ACM symposium on Cloud computing
MapReduce online

NSDI'10 Proceedings of the 7th USENIX conference on Networked systems design and implementation
DryadLINQ: a system for general-purpose distributed data-parallel computing using a high-level language

OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation

Dynamic routing of data stream tuples among parallel query plan running on multi-core processors

Distributed and Parallel Databases
SkewTune: mitigating skew in mapreduce applications

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Stubby: a transformation-based optimizer for MapReduce workflows

Proceedings of the VLDB Endowment
Designing good algorithms for MapReduce and beyond

Proceedings of the Third ACM Symposium on Cloud Computing
Big data begets big database theory

BNCOD'13 Proceedings of the 29th British National conference on Big Data
Scalable progressive analytics on big data in the cloud

Proceedings of the VLDB Endowment

Quantified Score

Hi-index	0.00

Visualization

Abstract

We address the problem of making online, parallel query plans fault-tolerant: i.e., provide intra-query fault-tolerance without blocking. We develop an approach that not only achieves this goal but does so through the use of different fault-tolerance techniques at different operators within a query plan. Enabling each operator to use a different fault-tolerance strategy leads to a space of fault-tolerance plans amenable to cost-based optimization. We develop FTOpt, a cost-based fault-tolerance optimizer that automatically selects the best strategy for each operator in a query plan in a manner that minimizes the expected processing time with failures for the entire query. We implement our approach in a prototype parallel query-processing engine. Our experiments demonstrate that (1) there is no single best fault-tolerance strategy for all query plans, (2) often hybrid strategies that mix-and-match recovery techniques outperform any uniform strategy, and (3) our optimizer correctly identifies winning fault-tolerance configurations.