Federated database systems for managing distributed, heterogeneous, and autonomous databases
ACM Computing Surveys (CSUR) - Special issue on heterogeneous databases
On global multidatabase query optimization
ACM SIGMOD Record
Theory and algorithms for plan merging
Artificial Intelligence
On the Approximation of Shortest Common Supersequencesand Longest Common Subsequences
SIAM Journal on Computing
Communication complexity
Query processing in a system for distributed databases (SDD-1)
ACM Transactions on Database Systems (TODS)
The String-to-String Correction Problem
Journal of the ACM (JACM)
Efficient and extensible algorithms for multi query optimization
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Distributed query evaluation on semistructured data
ACM Transactions on Database Systems (TODS)
Computers and Intractability: A Guide to the Theory of NP-Completeness
Computers and Intractability: A Guide to the Theory of NP-Completeness
On the Complexity of Distributed Query Optimization
IEEE Transactions on Knowledge and Data Engineering
Efficient Filtering of XML Documents for Selective Dissemination of Information
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Optimal implementation of conjunctive queries in relational data bases
STOC '77 Proceedings of the ninth annual ACM symposium on Theory of computing
Introduction to Data Mining, (First Edition)
Introduction to Data Mining, (First Edition)
BioScout: a life-science query monitoring system
EDBT '08 Proceedings of the 11th international conference on Extending database technology: Advances in database technology
Scalable multi-query optimization for exploratory queries over federated scientific databases
Proceedings of the VLDB Endowment
An extensible light-weight XML-Based monitoring system for sequence databases
DILS'06 Proceedings of the Third international conference on Data Integration in the Life Sciences
Hi-index | 0.00 |
Scientific data in the life sciences is distributed over various independent multi-format databases and is constantly expanding. We discuss a scenario where a life science research lab monitors over time the results of queries to remote databases beyond their control. Queries are registered at a local system and get executed on a daily basis in batch mode. The goal of the paper is to study evaluation strategies minimizing the total number of accesses to databases when evaluating all queries in bulk. We use an abstraction based on the relational model with fan-out constraints and conjunctive queries. We show that the above problem remains np-hard in two restricted settings: queries of bounded depth and the scenario with a fixed schema. We further show that both restrictions taken together results in a tractable problem. As the constant for the latter algorithm is too high to be feasible in practice, we present four heuristic methods that are experimentally compared on randomly generated and biologically motivated schemas. Our algorithms are based on a greedy method and approximations for the shortest common super sequence problem.