Mining Conditional Cardinality Patterns for Data Warehouse Query Optimization

Authors:
Mikołaj Morzy;Marcin Krystek
Affiliations:
Institute of Computing Science, Poznan University of Technology, Poznan, Poland 60-965;Poznan Supercomputing and Networking Center, Poznan, Poland 61-704
Venue:
DaWaK '08 Proceedings of the 10th international conference on Data Warehousing and Knowledge Discovery
Year:
2008

Citing 8
Cited 0

Advances in knowledge discovery and data mining

Advances in knowledge discovery and data mining
STHoles: a multidimensional workload-aware histogram

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Selectivity estimation using probabilistic models

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Exploiting statistics on query expressions for optimization

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Selectivity Estimation Without the Attribute Value Independence Assumption

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Fast Incremental Maintenance of Approximate Histograms

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Automating Statistics Management for Query Optimizers

ICDE '00 Proceedings of the 16th International Conference on Data Engineering
Dynamic Histograms: Capturing Evolving Data Sets

ICDE '00 Proceedings of the 16th International Conference on Data Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

Data mining algorithms are often embedded in more complex systems, serving as the provider of data for internal decision making within these systems. In this paper we address an interesting problem of using data mining techniques for database query optimization. We introduce the concept of conditional cardinality patterns and design an algorithm to compute the required values for a given database schema. However applicable to any database system, our solution is best suited for data warehouse environments due to the special characteristics of both database schemata being used and queries being asked. We verify our proposal experimentally by running our algorithm against the state-of-the-art database query optimizer. The results of conducted experiments show that our algorithm outperforms traditional cost-based query optimizer with respect to the accuracy of cardinality estimation for a wide range of queries.