CORDS: automatic discovery of correlations and soft functional dependencies
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
GORDIAN: efficient and scalable discovery of composite keys
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Record linkage with uniqueness constraints and erroneous values
Proceedings of the VLDB Endowment
Self-adaptive statistics management for efficient query processing
WAIM'05 Proceedings of the 6th international conference on Advances in Web-Age Information Management
Hi-index | 0.00 |
When query optimizers erroneously assume that database columns are statistically independent, they can underestimate the selectivities of conjunctive predicates by orders of magnitude. Such underestimation often leads to drastically suboptimal query execution plans. We demonstrate cords, an efficient and scalable tool for automatic discovery of correlations and soft functional dependencies between column pairs. We apply cords to real, synthetic, and TPC-H benchmark data, and show that cords discovers correlations in an efficient and scalable manner. The output of cords can be visualized graphically, making cords a useful mining and analysis tool for database administrators. cords ranks the discovered correlated column pairs and recommends to the optimizer a set of statistics to collect for the "most important" of the pairs. Use of these statistics speeds up processing times by orders of magnitude for a wide range of queries.