The pipelined set cover problem

  • Authors:
  • Kamesh Munagala;Shivnath Babu;Rajeev Motwani;Jennifer Widom

  • Affiliations:
  • Computer Science Department, Duke University;Computer Science Department, Stanford University;Computer Science Department, Stanford University;Computer Science Department, Stanford University

  • Venue:
  • ICDT'05 Proceedings of the 10th international conference on Database Theory
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

A classical problem in query optimization is to find the optimal ordering of a set of possibly correlated selections. We provide an abstraction of this problem as a generalization of set cover called pipelined set cover, where the sets are applied sequentially to the elements to be covered and the elements covered at each stage are discarded. We show that several natural heuristics for this NP-hard problem, such as the greedy set-cover heuristic and a local-search heuristic, can be analyzed using a linear-programming framework. These heuristics lead to efficient algorithms for pipelined set cover that can be applied to order possibly correlated selections in conventional database systems as well as data-stream processing systems. We use our linear-programming framework to show that the greedy and local-search algorithms are 4-approximations for pipelined set cover. We extend our analysis to minimize the lp-norm of the costs paid by the sets, where p ≥ 2 is an integer, to examine the improvement in performance when the total cost has increasing contribution from initial sets in the pipeline. Finally, we consider the online version of pipelined set cover and present a competitive algorithm with a logarithmic performance guarantee. Our analysis framework may be applicable to other problems in query optimization where it is important to account for correlations.