Making queries tractable on big data with preprocessing: through the eyes of complexity theory

  • Authors:
  • Wenfei Fan;Floris Geerts;Frank Neven

  • Affiliations:
  • Informatics, University of Edinburgh & RCBD and SKLSDE Lab, Beihang University;University of Antwerp;Hasselt University & transnational University of Limburg

  • Venue:
  • Proceedings of the VLDB Endowment
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

A query class is traditionally considered tractable if there exists a polynomial-time (PTIME) algorithm to answer its queries. When it comes to big data, however, PTIME algorithms often become infeasible in practice. A traditional and effective approach to coping with this is to preprocess data off-line, so that queries in the class can be subsequently evaluated on the data efficiently. This paper aims to provide a formal foundation for this approach in terms of computational complexity. (1) We propose a set of Π-tractable queries, denoted by ΠTQ0, to characterize classes of queries that can be answered in parallel poly-logarithmic time (NC) after PTIME preprocessing. (2) We show that several natural query classes are Π-tractable and are feasible on big data. (3) We also study a set ΠTQ of query classes that can be effectively converted to Π-tractable queries by refactorizing its data and queries for preprocessing. We introduce a form of NC reductions to characterize such conversions. (4) We show that a natural query class is complete for ΠTQ. (5) We also show that ΠTQ0 ⊂ P unless P = NC, i.e., the set ΠTQ0 of all Π-tractable queries is properly contained in the set P of all PTIME queries. Nonetheless, ΠTQ = P, i.e., all PTIME query classes can be made Π-tractable via proper refactorizations. This work is a step towards understanding the tractability of queries in the context of big data.