Making queries tractable on big data with preprocessing: through the eyes of complexity theory

Authors:
Wenfei Fan;Floris Geerts;Frank Neven
Affiliations:
Informatics, University of Edinburgh & RCBD and SKLSDE Lab, Beihang University;University of Antwerp;Hasselt University & transnational University of Limburg
Venue:
Proceedings of the VLDB Endowment
Year:
2013

Citing 38
Cited 0

A bridging model for parallel computation

Communications of the ACM
A catalog of complexity classes

Handbook of theoretical computer science (vol. A)
Limits to parallel computation: P-completeness theory

Limits to parallel computation: P-completeness theory
Clique partitions, graph compression and speeding-up algorithms

Journal of Computer and System Sciences
On the computational complexity of dynamic graph problems

Theoretical Computer Science
LogP: a practical model of parallel computation

Communications of the ACM
An introduction to partial evaluation

ACM Computing Surveys (CSUR)
A query language for NC

Journal of Computer and System Sciences - Special issue on principles of database systems
Data integration: a theoretical perspective

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Foundations of Databases: The Logical Level

Foundations of Databases: The Logical Level
Database Management Systems

Database Management Systems
Computers and Intractability; A Guide to the Theory of NP-Completeness

Computers and Intractability; A Guide to the Theory of NP-Completeness
Preprocessing of intractable problems

Information and Computation
Answering queries using views: A survey

The VLDB Journal — The International Journal on Very Large Data Bases
Optimal aggregation algorithms for middleware

Journal of Computer and System Sciences - Special issu on PODS 2001
Parameterized Complexity Theory (Texts in Theoretical Computer Science. An EATCS Series)

Parameterized Complexity Theory (Texts in Theoretical Computer Science. An EATCS Series)
MapReduce: simplified data processing on large clusters

Communications of the ACM - 50th anniversary issue: 1958 - 2008
Graph summarization with bounded error

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Optimal speedup on a low-degree multi-core parallel architecture (LoPRAM)

Proceedings of the twentieth annual symposium on Parallelism in algorithms and architectures
The Quest for a Logic Capturing PTIME

LICS '08 Proceedings of the 2008 23rd Annual IEEE Symposium on Logic in Computer Science
On compressing social networks

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Lowest common ancestors in trees and directed acyclic graphs

Journal of Algorithms
Algorithmic and Analysis Techniques in Property Testing

Foundations and Trends® in Theoretical Computer Science
Optimizing joins in a map-reduce environment

Proceedings of the 13th International Conference on Extending Database Technology
An incremental bisimulation algorithm

FSTTCS'07 Proceedings of the 27th international conference on Foundations of software technology and theoretical computer science
Neighbor query friendly compression of social networks

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
The declarative imperative: experiences and conjectures in distributed logic

ACM SIGMOD Record
A model of computation for MapReduce

SODA '10 Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms
On the Compressibility of $\mathcal{NP}$ Instances and Cryptographic Applications

SIAM Journal on Computing
Map-reduce extensions and recursive queries

Proceedings of the 14th International Conference on Extending Database Technology
Layered label propagation: a multiresolution coordinate-free ordering for compressing social networks

Proceedings of the 20th international conference on World wide web
Sublinear-time algorithms

Property testing
Parallel evaluation of conjunctive queries

Proceedings of the thirtieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Incremental graph pattern matching

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Space-Efficient Preprocessing Schemes for Range Minimum Queries on Static Arrays

SIAM Journal on Computing
Query preserving graph compression

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Inside "Big Data management": ogres, onions, or parfaits?

Proceedings of the 15th International Conference on Extending Database Technology
Transitive closure and recursive Datalog implemented on clusters

Proceedings of the 15th International Conference on Extending Database Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

A query class is traditionally considered tractable if there exists a polynomial-time (PTIME) algorithm to answer its queries. When it comes to big data, however, PTIME algorithms often become infeasible in practice. A traditional and effective approach to coping with this is to preprocess data off-line, so that queries in the class can be subsequently evaluated on the data efficiently. This paper aims to provide a formal foundation for this approach in terms of computational complexity. (1) We propose a set of Π-tractable queries, denoted by ΠTQ0, to characterize classes of queries that can be answered in parallel poly-logarithmic time (NC) after PTIME preprocessing. (2) We show that several natural query classes are Π-tractable and are feasible on big data. (3) We also study a set ΠTQ of query classes that can be effectively converted to Π-tractable queries by refactorizing its data and queries for preprocessing. We introduce a form of NC reductions to characterize such conversions. (4) We show that a natural query class is complete for ΠTQ. (5) We also show that ΠTQ0 ⊂ P unless P = NC, i.e., the set ΠTQ0 of all Π-tractable queries is properly contained in the set P of all PTIME queries. Nonetheless, ΠTQ = P, i.e., all PTIME query classes can be made Π-tractable via proper refactorizations. This work is a step towards understanding the tractability of queries in the context of big data.