A system for semantic query optimization
SIGMOD '87 Proceedings of the 1987 ACM SIGMOD international conference on Management of data
ACM Transactions on Database Systems (TODS)
Towards estimation error guarantees for distinct values
PODS '00 Proceedings of the nineteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Exploiting constraint-like data characterizations in query optimization
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Discovery and Application of Check Constraints in DB2
Proceedings of the 17th International Conference on Data Engineering
Implementation of Two Semantic Query Optimization Techniques in DB2 Universal Database
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Distinct Sampling for Highly-Accurate Answers to Distinct Values Queries and Event Reports
Proceedings of the 27th International Conference on Very Large Data Bases
CORDS: automatic discovery of correlations and soft functional dependencies
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Approximate encoding for direct access and query processing over compressed bitmaps
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
QUIST: a system for semantic query optimization in relational databases
VLDB '81 Proceedings of the seventh international conference on Very Large Data Bases - Volume 7
Knowledge-based query processing
VLDB '80 Proceedings of the sixth international conference on Very Large Data Bases - Volume 6
BHUNT: automatic discovery of Fuzzy algebraic constraints in relational data
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Adjoined Dimension Column Clustering to Improve Data Warehouse Query Performance
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
UPI: a primary index for uncertain databases
Proceedings of the VLDB Endowment
CORADD: correlation aware database designer for materialized views and indexes
Proceedings of the VLDB Endowment
Predicting cost amortization for query services
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Differential dependencies: Reasoning and discovery
ACM Transactions on Database Systems (TODS)
Design by example for SQL table definitions with functional dependencies
The VLDB Journal — The International Journal on Very Large Data Bases
Optimizing index deployment order for evolving OLAP
Proceedings of the 15th International Conference on Extending Database Technology
Hi-index | 0.00 |
In relational query processing, there are generally two choices for access paths when performing a predicate lookup for which no clustered index is available. One option is to use an unclustered index. Another is to perform a complete sequential scan of the table. Many analytical workloads do not benefit from the availability of unclustered indexes; the cost of random disk I/O becomes prohibitive for all but the most selective queries. It has been observed that a secondary index on an unclustered attribute can perform well under certain conditions if the unclustered attribute is correlated with a clustered index attribute [4]. The clustered index will co-locate values and the correlation will localize access through the unclustered attribute to a subset of the pages. In this paper, we show that in a real application (SDSS) and widely used benchmark (TPC-H), there exist many cases of attribute correlation that can be exploited to accelerate queries. We also discuss a tool that can automatically suggest useful pairs of correlated attributes. It does so using an analytical cost model that we developed, which is novel in its awareness of the effects of clustering and correlation. Furthermore, we propose a data structure called a Correlation Map (CM) that expresses the mapping between the correlated attributes, acting much like a secondary index. The paper also discusses how bucketing on the domains of both attributes in the correlated attribute pair can dramatically reduce the size of the CM to be potentially orders of magnitude smaller than that of a secondary B+Tree index. This reduction in size allows us to create a large number of CMs that improve performance for a wide range of queries. The small size also reduces maintenance costs as we demonstrate experimentally.