The Star Schema Benchmark and Augmented Fact Table Indexing

Authors:
Patrick O'Neil;Elizabeth O'Neil;Xuedong Chen;Stephen Revilak
Affiliations:
University of Massachusetts at Boston, Boston, USA 02125-3393;University of Massachusetts at Boston, Boston, USA 02125-3393;University of Massachusetts at Boston, Boston, USA 02125-3393;University of Massachusetts at Boston, Boston, USA 02125-3393
Venue:
Performance Evaluation and Benchmarking
Year:
2009

Citing 0
Cited 18

Flashing databases: expectations and limitations

Proceedings of the Sixth International Workshop on Data Management on New Hardware
The effects of virtualization on main memory systems

Proceedings of the Sixth International Workshop on Data Management on New Hardware
Benchmarking spatial data warehouses

DaWaK'10 Proceedings of the 12th international conference on Data warehousing and knowledge discovery
Real-time temporal data warehouse cubing

DEXA'10 Proceedings of the 21st international conference on Database and expert systems applications: Part II
XWeB: the XML warehouse benchmark

TPCTC'10 Proceedings of the Second TPC technology conference on Performance evaluation, measurement and characterization of complex systems
Benchmarking using basic DBMS operations

TPCTC'10 Proceedings of the Second TPC technology conference on Performance evaluation, measurement and characterization of complex systems
ONE: a predictable and scalable DW model

DaWaK'11 Proceedings of the 13th international conference on Data warehousing and knowledge discovery
A predictable storage model for scalable parallel DW

Proceedings of the 15th Symposium on International Database Engineering & Applications
The SB-index and the HSB-index: efficient indices for spatial data warehouses

Geoinformatica
Cost models for view materialization in the cloud

Proceedings of the 2012 Joint EDBT/ICDT Workshops
Reordering rows for better compression: Beyond the lexicographic order

ACM Transactions on Database Systems (TODS)
Benchmarking summarizability processing in XML warehouses with complex hierarchies

Proceedings of the fifteenth international workshop on Data warehousing and OLAP
Elastic online analytical processing on RAMCloud

Proceedings of the 16th International Conference on Extending Database Technology
Reversing statistics for scalable test databases generation

Proceedings of the Sixth International Workshop on Testing Database Systems
Variations of the star schema benchmark to test the effects of data skew on query performance

Proceedings of the 4th ACM/SPEC International Conference on Performance Engineering
Near real-time with traditional data warehouse architectures: factors and how-to

Proceedings of the 17th International Database Engineering & Applications Symposium
Red Fox: An Execution Environment for Relational Query Processing on GPUs

Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization
Real-time data warehouse: a solution and evaluation

International Journal of Business Intelligence and Data Mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

We provide a benchmark measuring star schema queries retrieving data from a fact table with Where clause column restrictions on dimension tables. Clustering is crucial to performance with modern disk technology, since retrievals with filter factors down to 0.0005 are now performed most efficiently by sequential table search rather than by indexed access. DB2's Multi-Dimensional Clustering (MDC) provides methods to "dice" the fact table along a number of orthogonal "dimensions", but only when these dimensions are columns in the fact table. The diced cells cluster fact rows on several of these "dimensions" at once so queries restricting several such columns can access crucially localized data, with much faster query response. Unfortunately, columns of dimension tables of a star schema are not usually represented in the fact table. In this paper, we show a simple way to adjoin physical copies of dimension columns to the fact table, dicing data to effectively cluster query retrieval, and explain how such dicing can be achieved on database products other than DB2. We provide benchmark measurements to show successful use of this methodology on three commercial database products.