Data stream processing with concurrency control

Authors:
Masafumi Oyamada;Hideyuki Kawashima;Hiroyuki Kitagawa
Affiliations:
NEC Cloud System Research Labs, Japan;University of Tsukuba, Japan;University of Tsukuba, Japan
Venue:
ACM SIGAPP Applied Computing Review
Year:
2013

Citing 18
Cited 0

On optimistic methods for concurrency control

ACM Transactions on Database Systems (TODS)
Transactional information systems: theory, algorithms, and the practice of concurrency control and recovery

Transactional information systems: theory, algorithms, and the practice of concurrency control and recovery
Models and issues in data stream systems

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Operator scheduling in data stream systems

The VLDB Journal — The International Journal on Very Large Data Bases
No pane, no gain: efficient evaluation of sliding-window aggregates over data streams

ACM SIGMOD Record
The CQL continuous query language: semantic foundations and query execution

The VLDB Journal — The International Journal on Very Large Data Bases
Operator scheduling in a data stream manager

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Transactional issues in sensor data management

DMSN '06 Proceedings of the 3rd workshop on Data management for sensor networks: in conjunction with VLDB 2006
SPADE: the system s declarative stream processing engine

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Meshing Streaming Updates with Persistent Data in an Active Data Warehouse

IEEE Transactions on Knowledge and Data Engineering
Stream warehousing with DataDepot

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
A partition-based approach to support streaming updates over persistent data in an active datawarehouse

IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
Continuous analytics over discontinuous streams

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
R-MESHJOIN for near-real-time data warehousing

DOLAP '10 Proceedings of the ACM 13th international workshop on Data warehousing and OLAP
Active complex event processing over event streams

Proceedings of the VLDB Endowment
Semantics of data streams and operators

ICDT'05 Proceedings of the 10th international conference on Database Theory
Transactional stream processing

Proceedings of the 15th International Conference on Extending Database Technology
Temporal Analytics on Big Data for Web Advertising

ICDE '12 Proceedings of the 2012 IEEE 28th International Conference on Data Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

A recent trend in data stream processing shows the use of advanced continuous queries (CQs) that reference non-streaming resources such as relational data in databases and machine learning models. Since non-streaming resources could be shared among multiple systems, resources may be updated by the systems during the CQ-execution. As a consequence, CQs may reference resources inconsistently, and lead to a wide range of problems from inappropriate results to fatal system failures. In this paper, we address this inconsistency problem by introducing the concept of transaction processing onto data stream processing. In the first part of this paper, we introduce CQ-derived transaction, a concept that derives read-only transactions from CQs, and illustrate that the inconsistency problem is solved by ensuring serializability of derived transactions and resource updating transactions. To ensure serializability, we propose three CQ-processing strategies based on concurrency control techniques: two-phase lock strategy, snapshot strategy, and optimistic strategy. Experimental study shows our CQ-processing strategies guarantee proper results, and their performances are comparable to the performance of conventional strategy that could produce improper results. In the second part of this paper, we try to improve the performance of our proposed strategies from the viewpoint of operator scheduling. We notice a characteristic of our proposed strategies: operators could be re-evaluated to prevent non-serializable schedules causing performance degradation. We find the fact that the number of operator re-evaluation depends on operator scheduling, and propose a scheduling constraint that reduces the re-evaluation. Experimental study shows our constraint's effectiveness: if we add the proposed constraint to operator scheduling, throughput increases up to 5.2 times compared to the naïve scheduling without the constraint.