Query reformulation for dynamic information integration
Journal of Intelligent Information Systems - Special issue on intelligent integration of information
InfoSleuth: agent-based semantic integration of information in open and dynamic environments
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
A scalable comparison-shopping agent for the World-Wide Web
AGENTS '97 Proceedings of the first international conference on Autonomous agents
The anatomy of a large-scale hypertextual Web search engine
WWW7 Proceedings of the seventh international conference on World Wide Web 7
SPHINX: a framework for creating personal, site-specific Web crawlers
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Efficient evaluation of queries in a mediator for WebSources
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Mercator: A scalable, extensible Web crawler
World Wide Web
RoadRunner: Towards Automatic Data Extraction from Large Web Sites
Proceedings of the 27th International Conference on Very Large Data Bases
Querying Heterogeneous Information Sources Using Source Descriptions
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Optimized Seamless Integration of Biomolecular Data
BIBE '01 Proceedings of the 2nd IEEE International Symposium on Bioinformatics and Bioengineering
XWRAP: An XML-Enabled Wrapper Construction System for Web Information Sources
ICDE '00 Proceedings of the 16th International Conference on Data Engineering
Extracting structured data from Web pages
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
DiscoveryLink: a system for integrated access to life sciences data sources
IBM Systems Journal - Deep computing for the life sciences
Inference of concise DTDs from XML data
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Inference of concise regular expressions and DTDs
ACM Transactions on Database Systems (TODS)
Automatically constructing a directory of molecular biology databases
DILS'07 Proceedings of the 4th international conference on Data integration in the life sciences
Domain-independent classification for deep web interfaces
WAIM'10 Proceedings of the 11th international conference on Web-age information management
Automatic generation of data types for classification of deep web sources
DILS'05 Proceedings of the Second international conference on Data Integration in the Life Sciences
Hi-index | 0.00 |
The World Wide Web provides a vast resource to genomics researchers, with Web-based access to distributed data sources such as BLAST sequence homology search interfaces. However, finding the desired scientific information can still be very tedious and frustrating. While there are several known servers on genomic data (e.g., GeneBank, EMBL, NCBI) that are shared and accessed frequently, new data sources are created each day in laboratories all over the world. Sharing these new genomics results is hindered by the lack of a common interface or data exchange mechanism. Moreover, the number of autonomous genomics sources and their rate of change outpace the speed at which they can be manually identified, meaning that the available data is not being utilized to its full potential. An automated system that can find, classify, describe, and wrap new sources without tedious and low-level coding of source-specific wrappers is needed to assist scientists in accessing hundreds of dynamically changing bioinformatics Web data sources through a single interface. A correct classification of any kind of Web data source must address both the capability of the source and the conversation/interaction semantics inherent in the design of the data source. We propose a service class description (SCD)-a meta-data approach for classifying Web data sources that takes into account both the capability and the conversational semantics of the source. The ability to discover the interaction pattern of a Web source leads to increased accuracy in the classification process. Our results show that an SCD-based approach successfully classifies two thirds of BLAST sites with 100% accuracy and two thirds of bioinformatics keyword search sites with around 80% precision.