Optimizing complex queries with multiple relation instances

  • Authors:
  • Yu Cao;Gopal C. Das;Chee-Yong Chan;Kian-Lee Tan

  • Affiliations:
  • National University of Singapore, Singapore, Singapore;National University of Singapore, Singapore, Singapore;National University of Singapore, Singapore, Singapore;National University of Singapore, Singapore, Singapore

  • Venue:
  • Proceedings of the 2008 ACM SIGMOD international conference on Management of data
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Today's query processing engines do not take advantage of the multiple occurrences of a relation in a query to improve performance. Instead, each instance is treated as a distinct relation and has its own independent table access method. In this paper, we present MAPLE, a Multi-instance-Aware PLan Evaluation engine that enables multiple instances of a relation to share one physical scan (called SharedScan) with limited buffer space. During execution, as SharedScan pulls a tuple for any instance, that tuple is also pushed to the buffers of other instances with matching predicates. To avoid buffer overflow, a novel interleaved execution strategy is proposed: whenever an instance's buffer becomes full, the execution is temporarily switched to a drainer (an ancestor blocking operator of the instance) to consume all the tuples in the buffer. Thus, the execution is interleaved between normal processing and drainers. We also propose a cost-based approach to generate a plan to maximize the shared scan benefit as well as to avoid interleaved execution deadlocks. MAPLE is light-weight and can be easily integrated into existing RDBMS executors. We have implemented MAPLE in PostgreSQL, and our experimental study on the TPC-DS benchmark shows significant reduction in execution time.