Optimization of multiple-relation multiple-disjunct queries

  • Authors:
  • M. Muralikrishna;David J. DeWitt

  • Affiliations:
  • Computer Sciences Department, University of Wiscsonsin, Madison, WI;Computer Sciences Department, University of Wiscsonsin, Madison, WI

  • Venue:
  • Proceedings of the seventh ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
  • Year:
  • 1988

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper we discuss the optimization of multiple-relation multiple-disjunct queries in a relational database system. Since optimization techniques for conjunctive (single disjunct) queries in relational databases are well known [Smith75, Wong76, Selinger79, Yao79, Youssefi79], the natural way to evaluate a multiple-disjunct query was to execute each disjunct independently [Bernstein81, Kerschberg82] However, evaluating each disjunct independently may be very inefficient. In this paper, we develop methods that merge two or more disjuncts to form a term. The advantage of merging disjuncts to form terms lies in the fact that each term can be evaluated with a single scan of each relation that is present in the term. In addition, the number of times a join is performed will also be reduced when two or more disjuncts are merged. The criteria for merging a set of disjuncts will be presented. As we will see, the number of times each relation in the query is scanned will be equal to the number of terms. Thus, minimizing the number of terms will minimize the number of scans for each relation. We will formulate the problem of minimizing the number of scans as one of covering a merge graph by a minimum number of complete merge graphs which are a restricted class of Cartesian product graphs. In general, the problem of minimizing the number of scans is NP-complete. We present polynomial time algorithms for special classes of merge graphs. We also present a heuristic for general merge graphs.Throughout this paper, we will assume that no relations have any indices on them and that we are only concerned with reducing the number of scans for each relation present in the query. What about relations that have indices on them? It turns out that our performance metric of reducing the number of scans is beneficial even in the case that there are indices. In [Muralikrishna88] we demonstrate that when optimizing single-relation multiple-disjunct queries, the cost (measured in terms of disk accesses) may be reduced if all the disjuncts are optimized together rather than individually. Thus, our algorithm for minimizing the number of terms is also very beneficial in cases where indices exist