Towards benchmarking stream data warehouses

  • Authors:
  • Arian Bär;Lukasz Golab

  • Affiliations:
  • FTW, Vienna, Austria;University of Waterloo, Waterloo, ON, Canada

  • Venue:
  • Proceedings of the fifteenth international workshop on Data warehousing and OLAP
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Data management systems are facing two challenges driven by the requirements of emerging data-intensive applications: more data and less time to process the data. Data volumes continue to increase as new sources and data collecting mechanisms appear. At the same time, these sources tend to be highly dynamic and generate data in the form of a stream, which requires quick reaction to newly arrived data. Traditional data warehouses enable scalable data storage and analytics, including the ability to define nested levels of materialized views. However, views are typically refreshed during downtimes---e.g., every night---which does not meet the latency requirements of many applications. Stream data warehousing is a new data management technology that allows nearly-continuous view refresh as new data arrive, which enables seamless integration of real-time monitoring and business intelligence with long-term data mining. In this paper, we argue that a new benchmark is required for stream warehouses, which should focus on measuring the property that determines the utility of these systems, namely how well they can keep up with the incoming data and guarantee the "freshness" of materialized views.