Information integration and knowledge acquisition from semantically heterogeneous biological data sources

Authors:
Doina Caragea;Jyotishman Pathak;Jie Bao;Adrian Silvescu;Carson Andorf;Drena Dobbs;Vasant Honavar
Affiliations:
Department of Computer Science, AI Research Laboratory;Department of Computer Science, AI Research Laboratory;Department of Computer Science, AI Research Laboratory;Department of Computer Science, AI Research Laboratory;Department of Computer Science, AI Research Laboratory;Department of Genetics, Development and Cell Biology, 1210 Molecular Biology;Department of Computer Science, AI Research Laboratory
Venue:
DILS'05 Proceedings of the Second international conference on Data Integration in the Life Sciences
Year:
2005

Citing 13
Cited 7

Managing semantic heterogeneity in databases: a theoretical prospective

PODS '97 Proceedings of the sixteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Advances in Distributed and Parallel Knowledge Discovery

Advances in Distributed and Parallel Knowledge Discovery
A Graph-Oriented Model for Articulation of Ontology Interdependencies

EDBT '00 Proceedings of the 7th International Conference on Extending Database Technology: Advances in Database Technology
Exploring Heterogeneous Biological Databases: Tools and Applications

EDBT '98 Proceedings of the 6th International Conference on Extending Database Technology: Advances in Database Technology
The Knowledge Model of Protégé-2000: Combining Interoperability and Flexibility

EKAW '00 Proceedings of the 12th European Workshop on Knowledge Acquisition, Modeling and Management
Distributed Description Logics: Directed Domain Correspondences in Federated Information Sources

On the Move to Meaningful Internet Systems, 2002 - DOA/CoopIS/ODBASE 2002 Confederated International Conferences DOA, CoopIS and ODBASE 2002
Mapping data in peer-to-peer systems: semantics and algorithmic issues

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Basic description logics

The description logic handbook
Handbook on Ontologies (International Handbooks on Information Systems)

Handbook on Ontologies (International Handbooks on Information Systems)
DiscoveryLink: a system for integrated access to life sciences data sources

IBM Systems Journal - Deep computing for the life sciences
K2/Kleisli and GUS: experiments in integrated access to genomic data sources

IBM Systems Journal - Deep computing for the life sciences
A Framework for Learning from Distributed Data Using Sufficient Statistics and Its Application to Learning Decision Trees

International Journal of Hybrid Intelligent Systems
Tools for assembling modular ontologies in ontolingua

AAAI'97/IAAI'97 Proceedings of the fourteenth national conference on artificial intelligence and ninth conference on Innovative applications of artificial intelligence

A multi-layered approach to protein data integration for diabetes research

Artificial Intelligence in Medicine
Assembling Composite Web Services from Autonomous Components

Proceedings of the 2007 conference on Emerging Artificial Intelligence Applications in Computer Engineering: Real Word AI Systems with Applications in eHealth, HCI, Information Retrieval and Pervasive Technologies
Survey of modular ontology techniques and their applications in the biomedical domain

Integrated Computer-Aided Engineering - Selected papers from the IEEE Conference on Information Reuse and Integration (IRI), July 13-15, 2008
Context-aware product bundling architecture in ubiquitous computing environments

PRICAI'06 Proceedings of the 9th Pacific Rim international conference on Artificial intelligence
Adapting Searchy to extract data using evolved wrappers

Expert Systems with Applications: An International Journal
Algorithms and software for collaborative discovery from autonomous, semantically heterogeneous, distributed information sources

ALT'05 Proceedings of the 16th international conference on Algorithmic Learning Theory
Semantic collation of enterprise data for effective information retrieval

ICDEM'10 Proceedings of the Second international conference on Data Engineering and Management

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present INDUS (Intelligent Data Understanding System), a federated, query-centric system for knowledge acquisition from autonomous, distributed, semantically heterogeneous data sources that can be viewed (conceptually) as tables. INDUS employs ontologies and inter-ontology mappings, to enable a user or an application to view a collection of such data sources (regardless of location, internal structure and query interfaces) as though they were a collection of tables structured according to an ontology supplied by the user. This allows INDUS to answer user queries against distributed, semantically heterogeneous data sources without the need for a centralized data warehouse or a common global ontology. We used INDUS framework to design algorithms for learning probabilistic models (e.g., Naive Bayes models) for predicting GO functional classification of a protein based on training sequences that are distributed among SWISSPROT and MIPS data sources. Mappings such as EC2GO and MIPS2GO were used to resolve the semantic differences between these data sources when answering queries posed by the learning algorithms. Our results show that INDUS can be successfully used for integrative analysis of data from multiple sources needed for collaborative discovery in computational biology.