The connectivity server: fast access to linkage information on the Web
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Focused crawling: a new approach to topic-specific Web resource discovery
WWW '99 Proceedings of the eighth international conference on World Wide Web
Evaluating topic-driven web crawlers
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Machine Learning
Focused Crawling Using Context Graphs
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Proceedings of the 27th International Conference on Very Large Data Bases
An interactive clustering-based approach to integrating source query interfaces on the deep Web
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Data management projects at Google
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Combining classifiers to identify online databases
Proceedings of the 16th international conference on World Wide Web
An adaptive crawler for locating hidden-Web entry points
Proceedings of the 16th international conference on World Wide Web
ProtocolDB: classifying resources with a domain ontology to support discovery
Proceedings of the 10th International Conference on Information Integration and Web-based Applications & Services
BiOnMap: a deductive approach for resource discovery
Proceedings of the 10th International Conference on Information Integration and Web-based Applications & Services
Resource descriptions, ontology, and resource discovery
International Journal of Metadata, Semantics and Ontologies
Understanding deep web search interfaces: a survey
ACM SIGMOD Record
A provenance-based approach to resource discovery in distributed molecular dynamics workflows
RED'09 Proceedings of the 2nd international conference on Resource discovery
Hi-index | 0.00 |
There has been an explosion in the volume of biology-related information that is available in online databases. But finding the right information can be challenging. Not only is this information spread over multiple sources, but often, it is hidden behind form interfaces of online databases. There are several ongoing efforts that aim to simplify the process of finding, integrating and exploring these data. However, existing approaches are not scalable, and require substantial manual input. Notable examples include the NCBI databases and the NAR database compilation. As an important step towards a scalable solution to this problem, we describe a new infrastructure that automates, to a large extent, the process of locating and organizing online databases. We show how this infrastructure can be used to automate the construction and maintenance of a Molecular Biology database collection. We also provide an evaluation which shows that the infrastructure is scalable and effective--it is able to efficiently locate and accurately identify the relevant online databases.