Time-Stratified Sampling for Approximate Answers to Aggregate Queries

  • Authors:
  • João Pedro Costa;Pedro Furtado

  • Affiliations:
  • -;-

  • Venue:
  • DASFAA '03 Proceedings of the Eighth International Conference on Database Systems for Advanced Applications
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

In large data warehousing environments, it isoften advantageous to provide fast, approximateanswers to complex aggregate queries based onsamples. However, uniformly extracted samplesoften do not guarantee acceptable accuracy ingrouping interval estimations. This is crucial inmost less-aggregated analyses, which are mostlybased on recent data (e.g.forecasting,performance analysis). We propose the use oftime-interval stratified samples (TISS), a simplesampling strategy that biases towards recency.This improves the accuracy in important less-aggregated analysis without significantlydeteriorating aggregated analysis on older data.TISS obtains a much better accuracy thaneither uniform or the recently proposedcongressional samples (CS) for queries analyzingrecent data and can be coupled with CS to provideminimal representation guarantees (TISS-CS).We discuss TISS design, the loading processand the query processing middle-layer. We showthat TISS is very easily integrated in a datawarehouse and works transparently. TISS isevaluated experimentally in a TPC-H setup.