Querying priced information in databases: The conjunctive case

  • Authors:
  • Renato Carmo;Tomás Feder;Yoshiharu Kohayakawa;Eduardo Laber;Rajeev Motwani;Liadan O'Callaghan;Rina Panigrahy;Dilys Thomas

  • Affiliations:
  • Universidade Federal do Paraná, Curitiba, PR Brasil;Stanford University, Stanford, CA;Universidade de São Paulo, São Paulo, Brasil;Pontifícia Universidade Católica do Rio de Janeiro, Sala;Stanford University, Stanford, CA;Stanford University, Stanford, CA;Cisco Systems, San Jose, CA;Stanford University, Stanford, CA

  • Venue:
  • ACM Transactions on Algorithms (TALG)
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Query optimization that involves expensive predicates has received considerable attention in the database community. Typically, the output to a database query is a set of tuples that satisfy certain conditions, and, with expensive predicates, these conditions may be computationally costly to verify. In the simplest case, when the query looks for the set of tuples that simultaneously satisfy k expensive predicates, the problem reduces to ordering the evaluation of the predicates so as to minimize the time to output the set of tuples comprising the answer to the query. We study different cases of the problem: the sequential case, in which a single processor is available to evaluate the predicates, and the distributed case, in which there are k processors available, each dedicated to a different attribute (column) of the database, and there is no communication cost between the processors. For the sequential case, we give a simple and fast deterministic k-approximation algorithm, and prove that k is the best possible approximation ratio for a deterministic algorithm, even if exponential time algorithms are allowed. We also propose a randomized, polynomial time algorithm with expected approximation ratio 1 + &sqrt;2/2 ≈ 1.707 for k = 2, and prove that 3/2 is the best possible expected approximation ratio for randomized algorithms. We also show that given 0 ≤ ϵ ≤ 1, no randomized algorithm achieves approximation ratio smaller than 1 + ϵ with probability larger than (1 + ϵ)/2. For the distributed case, we consider two different models: the preemptive model, in which a processor is allowed to interrupt the evaluation of a predicate, and the nonpreemptive model, in which the evaluation of a predicate must be completed once started. We show that k is the best possible approximation ratio for a deterministic algorithm, even if exponential time algorithms are allowed. For the preemptive model, we introduce a polynomial time k-approximation algorithm. For the nonpreemptive model, we introduce a polynomial time O(k log2 k)-approximation algorithm.