Lazy, adaptive rid-list intersection, and its application to index anding

Authors:
Vijayshankar Raman;Lin Qiao;Wei Han;Inderpal Narang;Ying-Lin Chen;Kou-Horng Yang;Fen-Ling Ling
Affiliations:
IBM Almaden Research Center, San Jose, CA;IBM Almaden Research Center, San Jose, CA;IBM Almaden Research Center, San Jose, CA;IBM Almaden Research Center, San Jose, CA;IBM Silicon Valley Lab, San Jose, CA;IBM Silicon Valley Lab, San Jose, CA;IBM Silicon Valley Lab, San Jose, CA
Venue:
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Year:
2007

Citing 10
Cited 6

Eddies: continuously adaptive query processing

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Adaptive set intersections, unions, and differences

SODA '00 Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms
Adaptive intersection and t-threshold problems

SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Holistic twig joins: optimal XML pattern matching

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
LEO - DB2's LEarning Optimizer

Proceedings of the 27th International Conference on Very Large Data Bases
Bucket Skip Merge Joi: A Scalable Algorithm for Join Processing in Very Large Databases using Indexes

Bucket Skip Merge Joi: A Scalable Algorithm for Join Processing in Very Large Databases using Indexes
Adaptive ordering of pipelined stream filters

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Declarative routing: extensible routing with declarative queries

Proceedings of the 2005 conference on Applications, technologies, architectures, and protocols for computer communications
Cost-based optimization in DB2 XML

IBM Systems Journal
Faster adaptive set intersections for text searching

WEA'06 Proceedings of the 5th international conference on Experimental Algorithms

Secondary indexing in one dimension: beyond b-trees and bitmap indexes

Proceedings of the twenty-eighth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Authenticated join processing in outsourced databases

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Improving the performance of list intersection

Proceedings of the VLDB Endowment
Workload-aware indexing for keyword search in social networks

Proceedings of the 20th ACM international conference on Information and knowledge management
Column-oriented query processing for row stores

Proceedings of the ACM 14th international workshop on Data Warehousing and OLAP
Efficient query processing for XML keyword queries based on the IDList index

The VLDB Journal — The International Journal on Very Large Data Bases

Quantified Score

Hi-index	0.00

Visualization

Abstract

RID-List (row id list) intersection is a common strategy in query processing, used in star joins, column stores, and even search engines. To apply a conjunction of predicates on a table, a query process ordoes index lookups to form sorted RID-lists (or bitmap) of the rows matching each predicate, then intersects the RID-lists via an AND-tree, and finally fetches the corresponding rows to apply any residual predicates and aggregates. This process can be expensive when the RID-lists are large. Furthermore, the performance is sensitive to the order in which RID lists are intersected together, and to treating the right predicates as residuals. If the optimizer chooses a wrong order or a wrong residual, due to a poor cardinality estimate, the resulting plan can run orders of magnitude slower than expected. We present a new algorithm for RID-list intersection that is both more efficient and more robust than this standard algorithm. First, we avoid forming the RID-lists up front, and instead form this lazily as part of the intersection. This reduces the associated IO and sort cost significantly, especially when the data distribution is skewed. It also ameliorates the problem of wrong residual table selection. Second, we do not intersect the RID-lists via an AND-tree, because this is vulnerable to cardinality mis-estimations. Instead, we use an adaptive set intersection algorithm that performs well even when the cardinality estimates are wrong. We present detailed experiments of this algorithm on data with varying distributions to validate its efficiency and predictability.