Optimizing monitoring queries over distributed data

Authors:
Frank Neven;Dieter Van de Craen
Affiliations:
Hasselt University and Transnational University of Limburg, Diepenbeek, Belgium;Hasselt University and Transnational University of Limburg, Diepenbeek, Belgium
Venue:
EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology
Year:
2006

Citing 14
Cited 3

Federated database systems for managing distributed, heterogeneous, and autonomous databases

ACM Computing Surveys (CSUR) - Special issue on heterogeneous databases
On global multidatabase query optimization

ACM SIGMOD Record
Theory and algorithms for plan merging

Artificial Intelligence
On the Approximation of Shortest Common Supersequencesand Longest Common Subsequences

SIAM Journal on Computing
Communication complexity

Communication complexity
Query processing in a system for distributed databases (SDD-1)

ACM Transactions on Database Systems (TODS)
The String-to-String Correction Problem

Journal of the ACM (JACM)
Efficient and extensible algorithms for multi query optimization

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Distributed query evaluation on semistructured data

ACM Transactions on Database Systems (TODS)
Computers and Intractability: A Guide to the Theory of NP-Completeness

Computers and Intractability: A Guide to the Theory of NP-Completeness
On the Complexity of Distributed Query Optimization

IEEE Transactions on Knowledge and Data Engineering
Efficient Filtering of XML Documents for Selective Dissemination of Information

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Optimal implementation of conjunctive queries in relational data bases

STOC '77 Proceedings of the ninth annual ACM symposium on Theory of computing
Introduction to Data Mining, (First Edition)

Introduction to Data Mining, (First Edition)

BioScout: a life-science query monitoring system

EDBT '08 Proceedings of the 11th international conference on Extending database technology: Advances in database technology
Scalable multi-query optimization for exploratory queries over federated scientific databases

Proceedings of the VLDB Endowment
An extensible light-weight XML-Based monitoring system for sequence databases

DILS'06 Proceedings of the Third international conference on Data Integration in the Life Sciences

Quantified Score

Hi-index	0.00

Visualization

Abstract

Scientific data in the life sciences is distributed over various independent multi-format databases and is constantly expanding. We discuss a scenario where a life science research lab monitors over time the results of queries to remote databases beyond their control. Queries are registered at a local system and get executed on a daily basis in batch mode. The goal of the paper is to study evaluation strategies minimizing the total number of accesses to databases when evaluating all queries in bulk. We use an abstraction based on the relational model with fan-out constraints and conjunctive queries. We show that the above problem remains np-hard in two restricted settings: queries of bounded depth and the scenario with a fixed schema. We further show that both restrictions taken together results in a tractable problem. As the constant for the latter algorithm is too high to be feasible in practice, we present four heuristic methods that are experimentally compared on randomly generated and biologically motivated schemas. Our algorithms are based on a greedy method and approximations for the shortest common super sequence problem.