The Star Schema Benchmark and Augmented Fact Table Indexing

  • Authors:
  • Patrick O'Neil;Elizabeth O'Neil;Xuedong Chen;Stephen Revilak

  • Affiliations:
  • University of Massachusetts at Boston, Boston, USA 02125-3393;University of Massachusetts at Boston, Boston, USA 02125-3393;University of Massachusetts at Boston, Boston, USA 02125-3393;University of Massachusetts at Boston, Boston, USA 02125-3393

  • Venue:
  • Performance Evaluation and Benchmarking
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

We provide a benchmark measuring star schema queries retrieving data from a fact table with Where clause column restrictions on dimension tables. Clustering is crucial to performance with modern disk technology, since retrievals with filter factors down to 0.0005 are now performed most efficiently by sequential table search rather than by indexed access. DB2's Multi-Dimensional Clustering (MDC) provides methods to "dice" the fact table along a number of orthogonal "dimensions", but only when these dimensions are columns in the fact table. The diced cells cluster fact rows on several of these "dimensions" at once so queries restricting several such columns can access crucially localized data, with much faster query response. Unfortunately, columns of dimension tables of a star schema are not usually represented in the fact table. In this paper, we show a simple way to adjoin physical copies of dimension columns to the fact table, dicing data to effectively cluster query retrieval, and explain how such dicing can be achieved on database products other than DB2. We provide benchmark measurements to show successful use of this methodology on three commercial database products.