Query Planning for Searching Inter-dependent Deep-Web Databases

Authors:
Fan Wang;Gagan Agrawal;Ruoming Jin
Affiliations:
Department of Computer Science and Engineering, Ohio State University, Columbus OH 43210;Department of Computer Science and Engineering, Ohio State University, Columbus OH 43210;Department of Computer Science, Kent State University, Kent OH 44242
Venue:
SSDBM '08 Proceedings of the 20th international conference on Scientific and Statistical Database Management
Year:
2008

Citing 25
Cited 4

Grammar-like functional rules for representing query optimization alternatives

SIGMOD '88 Proceedings of the 1988 ACM SIGMOD international conference on Management of data
An overview of query optimization in relational systems

PODS '98 Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
An adaptive query execution system for data integration

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Query optimization in the presence of limited access patterns

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Introduction to Algorithms

Introduction to Algorithms
Optimizing Queries Across Diverse Data Sources

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Quality-driven Integration of Heterogenous Information Systems

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Querying Heterogeneous Information Sources Using Source Descriptions

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
MiniCon: A scalable algorithm for answering queries using views

The VLDB Journal — The International Journal on Very Large Data Bases
Efficiently Ordering Query Plans for Data Integration

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
DBXplorer: A System for Keyword-Based Search over Relational Databases

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Keyword Searching and Browsing in Databases using BANKS

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Artificial Intelligence: Structures and Strategies for Complex Problem Solving (5th Edition)

Artificial Intelligence: Structures and Strategies for Complex Problem Solving (5th Edition)
Knocking the door to the deep Web: integrating Web query interfaces

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Challenges in selecting paths for navigational queries: trade-off of benefit of path versus cost of plan

Proceedings of the 7th International Workshop on the Web and Databases: colocated with ACM SIGMOD/PODS 2004
Automatic integration of Web search interfaces with WISE-Integrator

The VLDB Journal — The International Journal on Very Large Data Bases
BioNavigation: Using Ontologies to Express Meaningful Navigational Queries Over Biological Resources

CSBW '05 Proceedings of the 2005 IEEE Computational Systems Bioinformatics Conference - Workshops
Query Routing: Finding Ways in the Maze of the DeepWeb

WIRI '05 Proceedings of the International Workshop on Challenges in Web Information Retrieval and Integration
Systematic Approach for Optimizing Complex Mining Tasks on Multiple Databases

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Semantic Model to Integrate Biological Resources

ICDEW '06 Proceedings of the 22nd International Conference on Data Engineering Workshops
Accessing the web: from search to integration

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Query optimization over web services

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Accessing the deep web

Communications of the ACM - ACM at sixty: a look back in time
Efficient IR-style keyword search over relational databases

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Query planning in the presence of overlapping sources

EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology

Exploiting Parallelism to Accelerate Keyword Search on Deep-Web Sources

DILS '09 Proceedings of the 6th International Workshop on Data Integration in the Life Sciences
Instance discovery and schema matching with applications to biological deep web data integration

DILS'10 Proceedings of the 7th international conference on Data integration in the life sciences
Query reuse based query planning for searches over the deep web

DEXA'10 Proceedings of the 21st international conference on Database and expert systems applications: Part II
Answering complex structured queries over the deep web

Proceedings of the 15th Symposium on International Database Engineering & Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Increasingly, many data sources appear as online databases, hidden behind query forms, thus forming what is referred to as the deep web. It is desirable to have systems that can provide a high-level and simple interface for users to query such data sources, and can automate data retrieval from the deep web. However, such systems need to address the following challenges. First, in most cases, no single database can provide all desired data, and therefore, multiple different databases need to be queried for a given user query. Second, due to the dependencies present between the deep-web databases, certain databases must be queried before others. Third, some database may not be available at certain times because of network or hardware problems, and therefore, the query planning should be capable of dealing with unavailable databases and generating alternative plans when the optimal one is not feasible.This paper considers query planning in the context of a deep-web integration system. We have developed a dynamic query planner to generate an efficient query order based on the database dependencies. Our query planner is able to select the topKquery plans. We also develop cost models suitable for query planning for deep web mining. Our implementation and evaluation has been made in the context of a bioinformatics system, SNPMiner. We have compared our algorithm with a naive algorithm and the optimal algorithm. We show that for the 30 queries we used, our algorithm outperformed the naive algorithm and obtained very similar results as the optimal algorithm. Our experiments also show the scalability of our system with respect to the number of data sources involved and the number of query terms.