Scalable multi-query optimization for exploratory queries over federated scientific databases

  • Authors:
  • Anastasios Kementsietsidis;Frank Neven;Dieter Van de Craen;Stijn Vansummeren

  • Affiliations:
  • IBM T.J. Watson Research Center, New York;Hasselt University and Transnational University of Limburg, Belgium;Hasselt University and Transnational University of Limburg, Belgium;Hasselt University and Transnational University of Limburg, Belgium

  • Venue:
  • Proceedings of the VLDB Endowment
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

The diversity and large volumes of data processed in the Natural Sciences today has led to a proliferation of highly-specialized and autonomous scientific databases with inherent and often intricate relationships. As a user-friendly method for querying this complex, ever-expanding network of sources for correlations, we propose exploratory queries. Exploratory queries are loosely-structured, hence requiring only minimal user knowledge of the source network. Evaluating an exploratory query usually involves the evaluation of many distributed queries. As the number of such distributed queries can quickly become large, we attack the optimization problem for exploratory queries by proposing several multi-query optimization algorithms that compute a global evaluation plan while minimizing the total communication cost, a key bottleneck in distributed settings. The proposed algorithms are necessarily heuristics, as computing an optimal global evaluation plan is shown to be NP-hard. Finally, we present an implementation of our algorithms, along with experiments that illustrate their potential not only for the optimization of exploratory queries, but also for the multiquery optimization of large batches of standard queries.