On-the-Fly Integration and Ad Hoc Querying of Life Sciences Databases Using LifeDB

Authors:
Anupam Bhattacharjee;Aminul Islam;Mohammad Shafkat Amin;Shahriyar Hossain;Shazzad Hosain;Hasan Jamil;Leonard Lipovich
Affiliations:
Department of Computer Science, Wayne State University, USA;Department of Computer Science, Wayne State University, USA;Department of Computer Science, Wayne State University, USA;Department of Computer Science, Wayne State University, USA;Department of Computer Science, Wayne State University, USA;Department of Computer Science, Wayne State University, USA;Center for Molecular Medicine and Genetics, Wayne State University, USA
Venue:
DEXA '09 Proceedings of the 20th International Conference on Database and Expert Systems Applications
Year:
2009

Citing 10
Cited 4

DEByE - Date extraction by example

Data & Knowledge Engineering
A Heterogeneous Field Matching Method for Record Linkage

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
GORDIAN: efficient and scalable discovery of composite keys

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
XRPC: interoperable and efficient distributed XQuery

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
A relational approach to incrementally extracting and querying structure in unstructured data

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
BioFlow: A Web-Based Declarative Workflow Language for Life Sciences

SERVICES '08 Proceedings of the 2008 IEEE Congress on Services - Part I
The Power of Declarative Languages: A Comparative Exposition of Scientific Workflow Design Using BioFlow and Taverna

SERVICES '09 Proceedings of the 2009 Congress on Services - I
A Visual Interface for on-the-fly Biological Database Integration and Workflow Design Using VizBuilder

DILS '09 Proceedings of the 6th International Workshop on Data Integration in the Life Sciences
OntoMatch: a monotonically improving schema matching system for autonomous data integration

IRI'09 Proceedings of the 10th IEEE international conference on Information Reuse & Integration
FastWrap: an efficient wrapper for tabular data extraction from the web

IRI'09 Proceedings of the 10th IEEE international conference on Information Reuse & Integration

Data integration systems for scientific applications

OTM'10 Proceedings of the 2010 international conference on On the move to meaningful internet systems
Integrating large and distributed life sciences resources for systems biology research: progress and new challenges

Transactions on large-scale data- and knowledge-centered systems III
A secured collaborative model for data integration in life sciences

Transactions on large-scale data- and knowledge-centered systems IV
Querying KEGG pathways in logic

International Journal of Data Mining and Bioinformatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Data intensive applications in Life Sciences extensively use the hidden web as a platform for information sharing. Access to these heterogeneous hidden web resources is limited through the use of predefined web forms and interactive interfaces that users navigate manually, and assume responsibility for reconciling schema heterogeneity, extracting information and piping, transforming formats and so on in order to implement desired query sequences or scientific work flows. In this paper, we present a new data management system, called LifeDB , in which we offer support for currency without view materialization, and autonomous reconciliation of schema heterogeneity in one single platform through a declarative query language called BioFlow . In our approach, schema heterogeneity is resolved at run time by treating the hidden web resources as a virtual warehouses, and by supporting a set of primitives for data integration on-the-fly, extracting information and piping to other resources, and manipulating data in a way similar to traditional database systems to respond to application demands.