Update propagation in a streaming warehouse

  • Authors:
  • Theodore Johnson;Vladislav Shkapenyuk

  • Affiliations:
  • AT&T Labs-Research;AT&T Labs-Research

  • Venue:
  • SSDBM'11 Proceedings of the 23rd international conference on Scientific and statistical database management
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Streaming warehouses are used to monitor complex systems such as data centers, web site complexes, and world-wide networks, gathering and correlating rich collections of events and measurements. Ideally, a streaming warehouse provides both historical data, for deep analysis, and real-time data for rapid response to emerging opportunities or problems. The highly temporal nature of the data and the need to support parallel processing naturally leads to extensive use of horizontal partitioning to manage base tables and layers of materialized views. In this paper, we consider the problem of determining when to propagate updates from base tables to dependent views on a partition-wise basis using autonomous updates. We provide a correctness theory for propagating updates to materialized views, simple algorithms which correctly propagate updates, and examples of algorithms which do not. We extend these results to accommodate needs of production warehouses: repartitioning of tables, mutual consistency, and merge tables. We measure the update propagation delays incurred by two different update propagation algorithms in test and production DataDepot warehouses, and find that only those update propagation algorithms which impose no scheduling restrictions are acceptable for use in a real-time streaming warehouse.