Wave-indices: indexing evolving databases
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
NiagaraCQ: a scalable continuous query system for Internet databases
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
The TSQL2 Temporal Query Language
The TSQL2 Temporal Query Language
STREAM: the stanford stream data manager (demonstration description)
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
TelegraphCQ: continuous dataflow processing
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
High-Availability Algorithms for Distributed Stream Processing
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Update-pattern-aware modeling and processing of continuous queries
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
B-tree indexes for high update rates
ACM SIGMOD Record
Enabling Real-Time Querying of Live and Historical Stream Data
SSDBM '07 Proceedings of the 19th International Conference on Scientific and Statistical Database Management
Remembrance of streams past: overload-sensitive management of archived streams
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Linear road: a stream data management benchmark
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Retractable complex event processing and stream reasoning
RuleML'2011 Proceedings of the 5th international conference on Rule-based reasoning, programming, and applications
UpStream: storage-centric load management for streaming applications with update semantics
The VLDB Journal — The International Journal on Very Large Data Bases
Probabilistic inference of object identifications for event stream analytics
Proceedings of the 16th International Conference on Extending Database Technology
Proceedings of the 7th ACM international conference on Distributed event-based systems
Hi-index | 0.00 |
Data stream processing systems have become ubiquitous in academic and commercial sectors, with application areas that include financial services, network traffic analysis, battlefield monitoring and traffic control. The append-only model of streams implies that input data is immutable and therefore always correct. But in practice, streaming data sources often contend with noise (e.g., embedded sensors) or data entry errors (e.g., financial data feeds) resulting in erroneous inputs and by implication, erroneous query results. Many data stream sources (e.g., Reuters ticker feeds) issue "revision tuples" (revisions) that amend previously issued tuples (e.g. erroneous share prices). A stream processing engine might reasonably respond to revision inputs by generating revision outputs that correct previously emitted query results. We know of no stream processing system that presently has this capability. In this paper, we describe how a stream processing engine can be extended to support revision processing via replay. Replay-based revision processing techniques assume that a stream engine maintains an archive of recent data seen on each of its input streams. These archives are then queried in response to a revision, with the resulting tuples replayed through the system so as to generate corrected query outputs. We first present the design and implementation of the revision processing engine for the Borealis stream processing engine [1]. We then compare techniques for archiving streams to support replay, and then compare the performance and overhead of two revision processing techniques that replay input tuples to recompute and thereby revise previously output query results. These experiments reveal scalability issues due to the overhead required to maintain stream archives, and has motivated our current research on using sampling and data summarization (e.g., histograms) to reduce the data that must be stored in a stream archive.