Fault-tolerance in the Borealis distributed stream processing system

Authors:
Magdalena Balazinska;Hari Balakrishnan;Samuel Madden;Michael Stonebraker
Affiliations:
MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, MA;MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, MA;MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, MA;MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, MA
Venue:
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Year:
2005

Citing 28
Cited 51

How to assign votes in a distributed system

Journal of the ACM (JACM)
Implementing recoverable requests using queues

SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
Managing update conflicts in Bayou, a weakly connected replicated storage system

SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
The dangers of replication and a solution

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Eddies: continuously adaptive query processing

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Models and issues in data stream systems

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Partial results for online query processing

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
A survey of rollback-recovery protocols in message-passing systems

ACM Computing Surveys (CSUR)
Transaction Processing: Concepts and Techniques

Transaction Processing: Concepts and Techniques
Lessons from Giant-Scale Services

IEEE Internet Computing
Providing High Availability in Very Large Worklflow Management Systems

EDBT '96 Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology
Measuring the effects of internet path faults on reactive routing

SIGMETRICS '03 Proceedings of the 2003 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Weighted voting for replicated data

SOSP '79 Proceedings of the seventh ACM symposium on Operating systems principles
Approximate join processing over data streams

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Chain: operator scheduling for memory minimization in data stream systems

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
A theory of redo recovery

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Adaptive filters for continuous queries over distributed data streams

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Gigascope: a stream database for network applications

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Aurora: a new model and architecture for data stream management

The VLDB Journal — The International Journal on Very Large Data Bases
Approximate replication

Approximate replication
Highly available, fault-tolerant, parallel dataflows

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Replicated document management in a group communication system

CSCW '88 Proceedings of the 1988 ACM conference on Computer-supported cooperative work
High-Availability Algorithms for Distributed Stream Processing

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Flexible time management in data stream systems

PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Load shedding in a data stream manager

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Operator scheduling in a data stream manager

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Remembrance of streams past: overload-sensitive management of archived streams

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Query languages and data models for database sequences and data streams

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30

Towards correcting input data errors probabilistically using integrity constraints

MobiDE '06 Proceedings of the 5th ACM international workshop on Data engineering for wireless and mobile access
Quality-aware dstributed data delivery for continuous query services

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Multi-site cooperative data stream analysis

ACM SIGOPS Operating Systems Review
Delay aware querying with seaweed

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Towards a dependable architecture for internet-scale sensing

HOTDEP'06 Proceedings of the 2nd conference on Hot Topics in System Dependability - Volume 2
Dryad: distributed data-parallel programs from sequential building blocks

Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Autonomic operations in cooperative stream processing systems

HotAC II Hot Topics in Autonomic Computing on Hot Topics in Autonomic Computing
Fault-tolerance in the borealis distributed stream processing system

ACM Transactions on Database Systems (TODS)
Adaptive query processing

Foundations and Trends in Databases
Borealis-R: a replication-transparent stream processing system for wide-area monitoring applications

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Replica placement for high availability in distributed stream processing systems

Proceedings of the second international conference on Distributed event-based systems
Self healing in System-S

Cluster Computing
Wide-scale data stream management

ATC'08 USENIX 2008 Annual Technical Conference on Annual Technical Conference
Challenges in dependable internet-scale stream processing

Proceedings of the 2nd workshop on Dependable distributed data management
Out-of-order processing: a new architecture for high-performance stream systems

Proceedings of the VLDB Endowment
Fault-tolerant stream processing using a distributed, replicated file system

Proceedings of the VLDB Endowment
Ad-hoc data processing in the cloud

Proceedings of the VLDB Endowment
Configuring topologies of distributed semantic concept classifiers for continuous multimedia stream processing

MM '08 Proceedings of the 16th ACM international conference on Multimedia
Scientific workflow design for mere mortals

Future Generation Computer Systems
Utility-driven proactive management of availability in enterprise-scale information flows

Proceedings of the ACM/IFIP/USENIX 2006 International Conference on Middleware
CLASP: collaborating, autonomous stream processing systems

Proceedings of the ACM/IFIP/USENIX 2007 International Conference on Middleware
An empirical study of high availability in stream processing systems

Proceedings of the 10th ACM/IFIP/USENIX International Conference on Middleware
A dynamic platform for run-time adaptation

Pervasive and Mobile Computing
A rules-based approach for configuring chains of classifiers in real-time stream mining systems

EURASIP Journal on Advances in Signal Processing
RunTest: assuring integrity of dataflow processing in cloud computing infrastructures

ASIACCS '10 Proceedings of the 5th ACM Symposium on Information, Computer and Communications Security
MaD-WiSe: a distributed stream management system for wireless sensor networks

Software—Practice & Experience
Integration of reliable sensor data stream management into digital libraries

DELOS'07 Proceedings of the 1st international conference on Digital libraries: research and development
CLASP: collaborating, autonomous stream processing systems

MIDDLEWARE2007 Proceedings of the 8th ACM/IFIP/USENIX international conference on Middleware
Towards automated analysis of connections network in distributed stream processing system

DASFAA'08 Proceedings of the 13th international conference on Database systems for advanced applications
Collecting data streams from a distributed radio-based measurement system

DASFAA'08 Proceedings of the 13th international conference on Database systems for advanced applications
Continuous analytics over discontinuous streams

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Detouring and replication for fast and reliable internet-scale stream processing

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
MapReduce online

NSDI'10 Proceedings of the 7th USENIX conference on Networked systems design and implementation
A distributed approach for optimizing cascaded classifier topologies in real-time stream mining systems

IEEE Transactions on Image Processing
iFlow: an approach for fast and reliable Internet-scale stream processing utilizing detouring and replication

Proceedings of the VLDB Endowment
Reliable distributed data stream management in mobile environments

Information Systems
Foresighted tree configuration games in resource constrained distributed stream mining sensors

Ad Hoc Networks
Towards a dependable architecture for internetscale

HotDep'06 Proceedings of the Second conference on Hot topics in system dependability
A latency and fault-tolerance optimizer for online parallel query plans

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
In-situ MapReduce for log processing

USENIXATC'11 Proceedings of the 2011 USENIX conference on USENIX annual technical conference
Multicast with aggregated deliveries

Proceedings of the First International Workshop on Algorithms and Models for Distributed Event Processing
Fay: extensible distributed tracing from kernels to clusters

SOSP '11 Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles
Utility-driven proactive management of availability in enterprise-scale information flows

Middleware'06 Proceedings of the 7th ACM/IFIP/USENIX international conference on Middleware
Efficient and coordinated checkpointing for reliable distributed data stream management

ADBIS'06 Proceedings of the 10th East European conference on Advances in Databases and Information Systems
In-situ MapReduce for log processing

HotCloud'11 Proceedings of the 3rd USENIX conference on Hot topics in cloud computing
Scalable efficient composite event detection

COORDINATION'10 Proceedings of the 12th international conference on Coordination Models and Languages
Fay: Extensible Distributed Tracing from Kernels to Clusters

ACM Transactions on Computer Systems (TOCS)
Multicasting in the presence of aggregated deliveries

Journal of Parallel and Distributed Computing
TimeStream: reliable stream computation in the cloud

Proceedings of the 8th ACM European Conference on Computer Systems
Rollback-recovery without checkpoints in distributed event processing systems

Proceedings of the 7th ACM international conference on Distributed event-based systems
Supporting distributed feed-following apps over edge devices

Proceedings of the VLDB Endowment

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a replication-based approach to fault-tolerant distributed stream processing in the face of node failures, network failures, and network partitions. Our approach aims to reduce the degree of inconsistency in the system while guaranteeing that available inputs capable of being processed are processed within a specified time threshold. This threshold allows a user to trade availability for consistency: a larger time threshold decreases availability but limits inconsistency, while a smaller threshold increases availability but produces more inconsistent results based on partial data. In addition, when failures heal, our scheme corrects previously produced results, ensuring eventual consistency.Our scheme uses a data-serializing operator to ensure that all replicas process data in the same order, and thus remain consistent in the absence of failures. To regain consistency after a failure heals, we experimentally compare approaches based on checkpoint/redo and undo/redo techniques and illustrate the performance trade-offs between these schemes.