DiscoveryLink: a system for integrated access to life sciences data sources

Authors:
L. M. Haas;P. M. Schwarz;P. Kodali;E. Kotlar;J. E. Rice;W. C. Swope
Affiliations:
IBM Software Group, Silicon Valley Laboratory, 555 Bailey Road, San Jose, California;IBM Research Division, Almaden Research Center, 650 Harry Road, San Jose, California;3rd Millennium Inc., 125 Cambridge Park Drive, Cambridge, Massachusetts;Aventis Pharmaceuticals, Bridgewater, New Jersey;IBM Research Division, Almaden Research Center, 650 Harry Road, San Jose, California;IBM Research Division, Almaden Research Center, 650 Harry Road, San Jose, California
Venue:
IBM Systems Journal - Deep computing for the life sciences
Year:
2001

Citing 21
Cited 46

Join processing in database systems with large main memories

ACM Transactions on Database Systems (TODS)
SMILES, a chemical language and information system. 1. introduction to methodology and encoding rules

Journal of Chemical Information & Computer Sciences
Predicate migration: optimizing queries with expensive predicates

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Pegasus: a heterogeneous information management system

Modern database systems
Object-oriented extensions in SQL3: a status report

SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
The distributed interoperable object model and its application to large-scale interoperable database systems

CIKM '95 Proceedings of the fourth international conference on Information and knowledge management
Optimizing queries over multimedia repositories

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Query caching and optimization in distributed mediator systems

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
A complete guide to DB2 universal database

A complete guide to DB2 universal database
Access path selection in a relational database management system

SIGMOD '79 Proceedings of the 1979 ACM SIGMOD international conference on Management of data
Object Exchange Across Heterogeneous Information Sources

ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
Don't Scrap It, Wrap It! A Wrapper Architecture for Legacy Data Sources

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Optimizing Queries Across Diverse Data Sources

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
The Heterogeneity Problem and Middleware Technology: Experiences with and Performance of Database Gateways

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Cost Models DO Matter: Providing Cost Information for Diverse Data Sources in a Federated System

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Schema Mapping as Query Discovery

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Constructing and Maintaining Scientific Database Views in the Framework of the Object-Protocol Model

SSDBM '97 Proceedings of the Ninth International Conference on Scientific and Statistical Database Management
Integrating life sciences data-with a little Garlic

BIBE '00 Proceedings of the 1st IEEE International Symposium on Bioinformatics and Biomedical Engineering
Towards heterogeneous multimedia information systems: the Garlic approach

RIDE '95 Proceedings of the 5th International Workshop on Research Issues in Data Engineering-Distributed Object Management (RIDE-DOM'95)
Query Processing in the TAMBIS Bioinformatics Source Integration System

SSDBM '99 Proceedings of the 11th International Conference on Scientific and Statistical Database Management
Scaling heterogeneous databases and the design of Disco

ICDCS '96 Proceedings of the 16th International Conference on Distributed Computing Systems (ICDCS '96)

Efficient evaluation of queries in a mediator for WebSources

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
A system for knowledge management in bioinformatics

Proceedings of the eleventh international conference on Information and knowledge management
Bioinformatics Adventures in Database Research

ICDT '03 Proceedings of the 9th International Conference on Database Theory
Query Processing in Self-Profiling Composable Peer-to-Peer Mediator Databases

EDBT '02 Proceedings of the Worshops XMLDM, MDDE, and YRWS on XML-Based Data Management and Multimedia Engineering-Revised Papers
The Potential of Grid, Virtual Laboratories and Virtual Organizations for Bio-sciences

SOFSEM '01 Proceedings of the 28th Conference on Current Trends in Theory and Practice of Informatics Piestany: Theory and Practice of Informatics
A Query Language to Support Scientific Discovery

CSB '03 Proceedings of the IEEE Computer Society Conference on Bioinformatics
Data integration through database federation

IBM Systems Journal
Bioinformatics integration and agent technology

Journal of Biomedical Informatics
Integration of biological sources: current systems and challenges ahead

ACM SIGMOD Record
A specification language and service-oriented architecture to support distributed data management

Software—Practice & Experience
Automatic Discovery and Inferencing of Complex Bioinformatics Web Interfaces

World Wide Web
A software architecture for distributed geospatial decision support systems

dg.o '02 Proceedings of the 2002 annual national conference on Digital government research
Methodological Review: Data integration and genomic medicine

Journal of Biomedical Informatics
An agent- and ontology-based system for integrating public gene, protein, and disease databases

Journal of Biomedical Informatics
Graph data management for molecular and cell biology

IBM Journal of Research and Development - Systems biology
Enabling Information Integration and Workflows in a Grid Environment with Automatic Wrapper Generation

GRID '05 Proceedings of the 6th IEEE/ACM International Workshop on Grid Computing
Detecting inconsistency in biological molecular databases using ontologies

Data Mining and Knowledge Discovery
BIMS: an information management system for biobanking in the 21st century

IBM Systems Journal
Database challenges in the integration of biomedical data sets

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
BioScout: a life-science query monitoring system

EDBT '08 Proceedings of the 11th international conference on Extending database technology: Advances in database technology
An integrative approach for biological data mining and visualisation

International Journal of Data Mining and Bioinformatics
A collaborative experimentation environment for biosciences

International Journal of Networking and Virtual Organisations
A framework for scheduling parallel dbms user-defined programs on an attached high-performance computer

Proceedings of the 5th conference on Computing frontiers
Using Hierarchical Task Network Planning Techniques to Create Custom Web Search Services over Multiple Biomedical Databases

KES '08 Proceedings of the 12th international conference on Knowledge-Based Intelligent Information and Engineering Systems, Part II
Methodological Review: HCLS 2.0/3.0: Health care and life sciences data mashup using Web 2.0/3.0

Journal of Biomedical Informatics
Scalable multi-query optimization for exploratory queries over federated scientific databases

Proceedings of the VLDB Endowment
Supporting annotations on relations

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
A mediator-based approach to ontology generation and querying of molecular and phenotypic cereals data

International Journal of Metadata, Semantics and Ontologies
Integration and Mining of Genomic Annotations: Experiences and Perspectives in GFINDer Data Warehousing

DILS '09 Proceedings of the 6th International Workshop on Data Integration in the Life Sciences
BioExtract Server—An Integrated Workflow-Enabling System to Access and Analyze Heterogeneous, Distributed Biomolecular Data

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Combining a high-throughput bioinformatics grid and bioinformatics web services

GCCB'06 Proceedings of the 2006 international conference on Distributed, high-performance and grid computing in computational biology
SWAMI: integrating biological databases and analysis tools within user friendly environment

DILS'07 Proceedings of the 4th international conference on Data integration in the life sciences
Supporting creativity: towards associative discovery of new insights

PAKDD'08 Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining
Biomedical information integration middleware for clinical genomics

NGITS'09 Proceedings of the 7th international conference on Next generation information technologies and systems
Query performance evaluation of an architecture for fine-grained integration of heterogeneous grid data sources

Future Generation Computer Systems
An integration architecture designed to deal with the issues of biological scope, scale and complexity

DILS'10 Proceedings of the 7th international conference on Data integration in the life sciences
Algorithms and software for collaborative discovery from autonomous, semantically heterogeneous, distributed information sources

ALT'05 Proceedings of the 16th international conference on Algorithmic Learning Theory
Data access and integration in the ISPIDER proteomics grid

DILS'06 Proceedings of the Third international conference on Data Integration in the Life Sciences
Vertical integration of bioinformatics tools and information processing on analysis outcome

ISBMDA'05 Proceedings of the 6th International conference on Biological and Medical Data Analysis
Ontology guided data integration for computational prioritization of disease genes

OTM'06 Proceedings of the 2006 international conference on On the Move to Meaningful Internet Systems: AWeSOMe, CAMS, COMINF, IS, KSinBIT, MIOS-CIAO, MONET - Volume Part I
Learning layouts of biological datasets semi-automatically

DILS'05 Proceedings of the Second international conference on Data Integration in the Life Sciences
Information integration and knowledge acquisition from semantically heterogeneous biological data sources

DILS'05 Proceedings of the Second international conference on Data Integration in the Life Sciences
Cluster based integration of heterogeneous biological databases using the automed toolkit

DILS'05 Proceedings of the Second international conference on Data Integration in the Life Sciences
Hybrid integration of molecular-biological annotation data

DILS'05 Proceedings of the Second international conference on Data Integration in the Life Sciences
Performance-oriented privacy-preserving data integration

DILS'05 Proceedings of the Second international conference on Data Integration in the Life Sciences
Challenges storing and representing biomedical data

USAB'11 Proceedings of the 7th conference on Workgroup Human-Computer Interaction and Usability Engineering of the Austrian Computer Society: information Quality in e-Health

Quantified Score

Hi-index	0.00

Visualization

Abstract

Vast amounts of life sciences data reside today in specialized data sources, with specialized query processing capabilities. Data from one source often must be combined with data from other sources to give users the information they desire. There are database middleware systems that extract data from multiple sources in response to a single query. IBM's DiscoveryLink is one such system, targeted to applications from the life sciences industry. DiscoveryLink provides users with a virtual database to which they can pose arbitrarily complex queries, even though the actual data needed to answer the query may originate from several different sources, and none of those sources, by itself, is capable of answering the query. We describe the DiscoveryLink offering, focusing on two key elements, the wrapper architecture and the query optimizer, and illustrate how it can be used to integrate the access to life sciences data from heterogeneous data sources.