Reverse engineering complex join queries

Authors:
Meihui Zhang;Hazem Elmeleegy;Cecilia M. Procopiuc;Divesh Srivastava
Affiliations:
National University of Singapore, Singapore, Singapore;Turn, Inc., Redwood City, CA, USA;AT&T Labs - Research, Florham Park, NJ, USA;AT&T Labs - Research, Florham Park, NJ, USA
Venue:
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Year:
2013

Citing 9
Cited 0

DBXplorer: A System for Keyword-Based Search over Relational Databases

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Keyword Searching and Browsing in Databases using BANKS

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
BLINKS: ranked keyword searches on graphs

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Querying Communities in Relational Databases

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Query by output

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Keyword search in databases: the power of RDBMS

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Synthesizing view definitions from data

Proceedings of the 13th International Conference on Database Theory
Sample-driven schema mapping

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
SODA: generating SQL for business users

Proceedings of the VLDB Endowment

Quantified Score

Hi-index	0.00

Visualization

Abstract

We study the following problem: Given a database D with schema G and an output table Out, compute a join query Q that generates OUT from D. A simpler variant allows Q to return a superset of Out. This problem has numerous applications, both by itself, and as a building block for other problems. Related prior work imposes conditions on the structure of Q which are not always consistent with the application, but simplify computation. We discuss several natural SQL queries that do not satisfy these conditions and cannot be discovered by prior work. In this paper, we propose an efficient algorithm that discovers queries with arbitrary join graphs. A crucial insight is that any graph can be characterized by the combination of a simple structure, called a star, and a series of merge steps over the star. The merge steps define a lattice over graphs derived from the same star. This allows us to explore the set of candidate solutions in a principled way and quickly prune out a large number of infeasible graphs. We also design several optimizations that significantly reduce the running time. Finally, we conduct an extensive experimental study over a benchmark database and show that our approach is scalable and accurately discovers complex join queries.