BioScout: a life-science query monitoring system

  • Authors:
  • Anastasios Kementsietsidis;Frank Neven;Dieter Van de Craen

  • Affiliations:
  • IBM T. J. Watson Research Center;Hasselt University and Transnational Univ. of Limburg;Hasselt University and Transnational Univ. of Limburg

  • Venue:
  • EDBT '08 Proceedings of the 11th international conference on Extending database technology: Advances in database technology
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Scientific data are available through an increasing number of heterogeneous, independently evolving, sources. Although the sources themselves are independently evolving, the data stored in them are not. There exist inherent and intricate relationships between the distributed data-sets and scientists are routinely required to write distributed queries in this setting. Being non-experts in computer science, the scientists are faced with two major challenges: (i) How to express such distributed queries. This is a non-trivial task, even if we assume that scientists are familiar with query languages like SQL. Such queries can get arbitrarily complex as more sources are considered; (ii) How to efficiently evaluate such distributed queries. An efficient evaluation must account for batches of hundreds (or even thousands) of submitted queries and must optimize all of them as a whole. In this demo, we focus on the biological domain for illustration purposes (our solutions are applicable to other scientific domains) and we present a system, called BioScout, that offers solutions in both of the above challenges. In more detail, we demonstrate the following functionality: (i) in BioScout, scientists draw their queries graphically, resulting in a query graph. The scientist is unaware of the query language used or of any optimization issues. Given the query graph, the system is able to generate, as a first step, an optimal query plan for the submitted query; (ii) BioScout uses four different strategies to combine the optimal query plans of individual queries to generate a global query plan for all the submitted queries. In the demo, we illustrate graphically how each of the four strategies works.