Run-time operator state spilling for memory intensive long-running queries

  • Authors:
  • Bin Liu;Yali Zhu;Elke Rundensteiner

  • Affiliations:
  • Worcester Polytechnic Institut;Worcester Polytechnic Institut;WPI

  • Venue:
  • Proceedings of the 2006 ACM SIGMOD international conference on Management of data
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Main memory is a critical resource when processing long-running queries over data streams with state intensive operators. In this work, we investigate state spill strategies that handle run-time memory shortage when processing such complex queries by selectively pushing operator states into disks. Unlike previous solutions which all focus on one single operator only, we instead target queries with multiple state intensive operators. We observe an interdependency among multiple operators in the query plan when spilling operator states. We illustrate that existing strategies, which do not take account of this interdependency, become largely ineffective in this query context. Clearly, a consolidated plan level spill strategy must be devised to address this problem. Several data spill strategies are proposed in this paper to maximize the run-time query throughput in memory constrained environments. The bottom-up state spill strategy is an operator-level strategy that treats all data in one operator state equally. More sophisticated partition-level data spill strategies are then proposed to take different characteristics of the input data into account, including the local output, the global output and the global output with penalty strategies. All proposed state spill strategies have been implemented in the D-CAPE continuous query system. The experimental results confirm the effectiveness of our proposed strategies. In particular, the global output strategy and the global output with penalty strategy have shown favorable results as compared to the other two more localized strategies.