Data stream processing with concurrency control

  • Authors:
  • Masafumi Oyamada;Hideyuki Kawashima;Hiroyuki Kitagawa

  • Affiliations:
  • NEC Cloud System Research Labs, Japan;University of Tsukuba, Japan;University of Tsukuba, Japan

  • Venue:
  • ACM SIGAPP Applied Computing Review
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

A recent trend in data stream processing shows the use of advanced continuous queries (CQs) that reference non-streaming resources such as relational data in databases and machine learning models. Since non-streaming resources could be shared among multiple systems, resources may be updated by the systems during the CQ-execution. As a consequence, CQs may reference resources inconsistently, and lead to a wide range of problems from inappropriate results to fatal system failures. In this paper, we address this inconsistency problem by introducing the concept of transaction processing onto data stream processing. In the first part of this paper, we introduce CQ-derived transaction, a concept that derives read-only transactions from CQs, and illustrate that the inconsistency problem is solved by ensuring serializability of derived transactions and resource updating transactions. To ensure serializability, we propose three CQ-processing strategies based on concurrency control techniques: two-phase lock strategy, snapshot strategy, and optimistic strategy. Experimental study shows our CQ-processing strategies guarantee proper results, and their performances are comparable to the performance of conventional strategy that could produce improper results. In the second part of this paper, we try to improve the performance of our proposed strategies from the viewpoint of operator scheduling. We notice a characteristic of our proposed strategies: operators could be re-evaluated to prevent non-serializable schedules causing performance degradation. We find the fact that the number of operator re-evaluation depends on operator scheduling, and propose a scheduling constraint that reduces the re-evaluation. Experimental study shows our constraint's effectiveness: if we add the proposed constraint to operator scheduling, throughput increases up to 5.2 times compared to the naïve scheduling without the constraint.