Replay-based approaches to revision processing in stream query engines

Authors:
Anurag S. Maskey;Mitch Cherniack
Affiliations:
Brandeis University, Waltham, MA;Brandeis University, Waltham, MA
Venue:
SSPS '08 Proceedings of the 2nd international workshop on Scalable stream processing system
Year:
2008

Citing 11
Cited 4

Wave-indices: indexing evolving databases

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
NiagaraCQ: a scalable continuous query system for Internet databases

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
The TSQL2 Temporal Query Language

The TSQL2 Temporal Query Language
STREAM: the stanford stream data manager (demonstration description)

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
TelegraphCQ: continuous dataflow processing

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
High-Availability Algorithms for Distributed Stream Processing

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Update-pattern-aware modeling and processing of continuous queries

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
B-tree indexes for high update rates

ACM SIGMOD Record
Enabling Real-Time Querying of Live and Historical Stream Data

SSDBM '07 Proceedings of the 19th International Conference on Scientific and Statistical Database Management
Remembrance of streams past: overload-sensitive management of archived streams

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Linear road: a stream data management benchmark

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30

Retractable complex event processing and stream reasoning

RuleML'2011 Proceedings of the 5th international conference on Rule-based reasoning, programming, and applications
UpStream: storage-centric load management for streaming applications with update semantics

The VLDB Journal — The International Journal on Very Large Data Bases
Probabilistic inference of object identifications for event stream analytics

Proceedings of the 16th International Conference on Extending Database Technology
Reliable speculative processing of out-of-order event streams in generic publish/subscribe middlewares

Proceedings of the 7th ACM international conference on Distributed event-based systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Data stream processing systems have become ubiquitous in academic and commercial sectors, with application areas that include financial services, network traffic analysis, battlefield monitoring and traffic control. The append-only model of streams implies that input data is immutable and therefore always correct. But in practice, streaming data sources often contend with noise (e.g., embedded sensors) or data entry errors (e.g., financial data feeds) resulting in erroneous inputs and by implication, erroneous query results. Many data stream sources (e.g., Reuters ticker feeds) issue "revision tuples" (revisions) that amend previously issued tuples (e.g. erroneous share prices). A stream processing engine might reasonably respond to revision inputs by generating revision outputs that correct previously emitted query results. We know of no stream processing system that presently has this capability. In this paper, we describe how a stream processing engine can be extended to support revision processing via replay. Replay-based revision processing techniques assume that a stream engine maintains an archive of recent data seen on each of its input streams. These archives are then queried in response to a revision, with the resulting tuples replayed through the system so as to generate corrected query outputs. We first present the design and implementation of the revision processing engine for the Borealis stream processing engine [1]. We then compare techniques for archiving streams to support replay, and then compare the performance and overhead of two revision processing techniques that replay input tuples to recompute and thereby revise previously output query results. These experiments reveal scalability issues due to the overhead required to maintain stream archives, and has motivated our current research on using sampling and data summarization (e.g., histograms) to reduce the data that must be stored in a stream archive.