Workload-Aware Histograms for Remote Applications

  • Authors:
  • Tanu Malik;Randal Burns

  • Affiliations:
  • Cyber Center, Purdue University,;Department of Computer Science, Johns Hopkins University,

  • Venue:
  • DaWaK '08 Proceedings of the 10th international conference on Data Warehousing and Knowledge Discovery
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Recently several database-based applications have emerged that are remote from data sources and need accurate histograms for query cardinality estimation. Traditional approaches for constructing histograms require complete access to data and are I/O and network intensive, and therefore no longer apply to these applications. Recent approaches use queries and their feedback to construct and maintain "workload aware" histograms. However, these approaches either employ heuristics, thereby providing no guarantees on the overall histogram accuracy, or rely on detailed query feedbacks, thus making them too expensive to use. In this paper, we propose a novel, incremental method for constructing histograms that uses minimum feedback and guarantees minimum overall residual error. Experiments on real, high dimensional data shows 30-40% higher estimation accuracy over currently known heuristic approaches, which translates to significant performance improvement of remote applications.