Eddies: continuously adaptive query processing
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Adaptive set intersections, unions, and differences
SODA '00 Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms
Adaptive intersection and t-threshold problems
SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Holistic twig joins: optimal XML pattern matching
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
LEO - DB2's LEarning Optimizer
Proceedings of the 27th International Conference on Very Large Data Bases
Bucket Skip Merge Joi: A Scalable Algorithm for Join Processing in Very Large Databases using Indexes
Adaptive ordering of pipelined stream filters
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Declarative routing: extensible routing with declarative queries
Proceedings of the 2005 conference on Applications, technologies, architectures, and protocols for computer communications
Cost-based optimization in DB2 XML
IBM Systems Journal
Faster adaptive set intersections for text searching
WEA'06 Proceedings of the 5th international conference on Experimental Algorithms
Secondary indexing in one dimension: beyond b-trees and bitmap indexes
Proceedings of the twenty-eighth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Authenticated join processing in outsourced databases
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Improving the performance of list intersection
Proceedings of the VLDB Endowment
Workload-aware indexing for keyword search in social networks
Proceedings of the 20th ACM international conference on Information and knowledge management
Column-oriented query processing for row stores
Proceedings of the ACM 14th international workshop on Data Warehousing and OLAP
Efficient query processing for XML keyword queries based on the IDList index
The VLDB Journal — The International Journal on Very Large Data Bases
Hi-index | 0.00 |
RID-List (row id list) intersection is a common strategy in query processing, used in star joins, column stores, and even search engines. To apply a conjunction of predicates on a table, a query process ordoes index lookups to form sorted RID-lists (or bitmap) of the rows matching each predicate, then intersects the RID-lists via an AND-tree, and finally fetches the corresponding rows to apply any residual predicates and aggregates. This process can be expensive when the RID-lists are large. Furthermore, the performance is sensitive to the order in which RID lists are intersected together, and to treating the right predicates as residuals. If the optimizer chooses a wrong order or a wrong residual, due to a poor cardinality estimate, the resulting plan can run orders of magnitude slower than expected. We present a new algorithm for RID-list intersection that is both more efficient and more robust than this standard algorithm. First, we avoid forming the RID-lists up front, and instead form this lazily as part of the intersection. This reduces the associated IO and sort cost significantly, especially when the data distribution is skewed. It also ameliorates the problem of wrong residual table selection. Second, we do not intersect the RID-lists via an AND-tree, because this is vulnerable to cardinality mis-estimations. Instead, we use an adaptive set intersection algorithm that performs well even when the cardinality estimates are wrong. We present detailed experiments of this algorithm on data with varying distributions to validate its efficiency and predictability.