SEEDEEP: A System for Exploring and Querying Scientific Deep Web Data Sources

Authors:
Fan Wang;Gagan Agrawal
Affiliations:
Department of Computer Science and Engineering, Ohio State University, Columbus OH 43210;Department of Computer Science and Engineering, Ohio State University, Columbus OH 43210
Venue:
SSDBM 2009 Proceedings of the 21st International Conference on Scientific and Statistical Database Management
Year:
2009

Citing 9
Cited 1

Knocking the door to the deep Web: integrating Web query interfaces

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Automatic integration of Web search interfaces with WISE-Integrator

The VLDB Journal — The International Journal on Very Large Data Bases
Bidirectional expansion for keyword search on graph databases

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Accessing the web: from search to integration

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Accessing the deep web

Communications of the ACM - ACM at sixty: a look back in time
Mining templates from search result records of search engines

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Discover: keyword search in relational databases

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
BANKS: browsing and keyword searching in relational databases

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Scalable multi-query optimization for exploratory queries over federated scientific databases

Proceedings of the VLDB Endowment

Exploiting Parallelism to Accelerate Keyword Search on Deep-Web Sources

DILS '09 Proceedings of the 6th International Workshop on Data Integration in the Life Sciences

Quantified Score

Hi-index	0.00

Visualization

Abstract

A recent and emerging trend in scientific data dissemination involves online databases that are hidden behind query forms, thus forming what is referred to as the deep web . In this paper, we propose SEEDEEP, a System for Exploring and quErying scientific DEEP web data sources. SEEDEEP is able to automatically mine deep web data source schemas, integrate heterogeneous data sources, answer cross-source keyword queries, and incorporates features like caching and fault-tolerance. Currently, SEEDEEP integrates 16 deep web data sources in the biological domain. We demonstrate how an integrated model for correlated deep web data sources is constructed, how a complex cross-source keyword query is answered efficiently and correctly, and how important performance issues are addressed.