Integrating large and distributed life sciences resources for systems biology research: progress and new challenges

Authors:
Hasan Jamil
Affiliations:
Department of Computer Science, Wayne State University
Venue:
Transactions on large-scale data- and knowledge-centered systems III
Year:
2011

Citing 49
Cited 0

An Algorithm for Subgraph Isomorphism

Journal of the ACM (JACM)
IEPAD: information extraction based on pattern discovery

Proceedings of the 10th international conference on World Wide Web
A case for parameterized views and relational unification

Proceedings of the 2001 ACM symposium on Applied computing
Algorithms on Trees and Graphs

Algorithms on Trees and Graphs
XClust: clustering XML schemas for effective integration

Proceedings of the eleventh international conference on Information and knowledge management
DEByE - Date extraction by example

Data & Knowledge Engineering
Information Source Tracking Method: Efficiency Issues

IEEE Transactions on Knowledge and Data Engineering
A Parametric Approach to Deductive Databases with Uncertainty

IEEE Transactions on Knowledge and Data Engineering
Generic Schema Matching with Cupid

Proceedings of the 27th International Conference on Very Large Data Bases
SchemaSQL - A Language for Interoperability in Relational Multi-Database Systems

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Efficient Record Linkage in Large Data Sets

DASFAA '03 Proceedings of the Eighth International Conference on Database Systems for Advanced Applications
On a theory of probabilistic deductive databases

Theory and Practice of Logic Programming
Kepler: An Extensible System for Design and Execution of Scientific Workflows

SSDBM '04 Proceedings of the 16th International Conference on Scientific and Statistical Database Management
Triana: A Graphical Web Service Composition and Execution Toolkit

ICWS '04 Proceedings of the IEEE International Conference on Web Services
XML programming with SQL/XML and XQuery

IBM Systems Journal
Towards an Industrial Strength SQL/XML Infrastructure

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Schema and ontology matching with COMA++

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
MetaQuerier: querying structured web sources on-the-fly

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Automatic ontology matching using application semantics

AI Magazine - Special issue on semantic integration
GORDIAN: efficient and scalable discovery of composite keys

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Pegasus: A framework for mapping complex scientific workflows onto distributed systems

Scientific Programming
Fine-grained access control to web databases

Proceedings of the 12th ACM symposium on Access control models and technologies
XRPC: interoperable and efficient distributed XQuery

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
BioGuideSRS

Bioinformatics
Simple and fast alignment of metabolic pathways by exploiting local diversity

Bioinformatics
An XML Schema integration and query mechanism system

Data & Knowledge Engineering
Graphs-at-a-time: query language and access methods for graph databases

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
GOSAP: Gene Ontology-Based Semantic Alignment of Biological Pathways

International Journal of Bioinformatics Research and Applications
PhyQL: A Web-Based Phylogenetic Visual Query Engine

BIBM '08 Proceedings of the 2008 IEEE International Conference on Bioinformatics and Biomedicine
Aggregation of Information Resources on the Invisible Web

WKDD '09 Proceedings of the 2009 Second International Workshop on Knowledge Discovery and Data Mining
Time-completeness trade-offs in record linkage using adaptive query processing

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Combining multiple positive training sets to generate confidence scores for protein–protein interactions

Bioinformatics
Representing Multiple Mappings between XML and Relational Schemas for Bi-directional Query Translation

APWeb/WAIM '09 Proceedings of the Joint International Conferences on Advances in Data and Web Management
KEGGgraph

Bioinformatics
Learning blocking schemes for record linkage

AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
A Visual Interface for on-the-fly Biological Database Integration and Workflow Design Using VizBuilder

DILS '09 Proceedings of the 6th International Workshop on Data Integration in the Life Sciences
Query translation from XPath to SQL in the presence of recursive DTDs

The VLDB Journal — The International Journal on Very Large Data Bases
On-the-Fly Integration and Ad Hoc Querying of Life Sciences Databases Using LifeDB

DEXA '09 Proceedings of the 20th International Conference on Database and Expert Systems Applications
Post processing wrapper generated tables for labeling anonymous datasets

Proceedings of the eleventh international workshop on Web information and data management
An Algebraic Language for Semantic Data Integration on the Hidden Web

ICSC '09 Proceedings of the 2009 IEEE International Conference on Semantic Computing
A Novel Knowledge Representation Framework for Computing Sub-Graph Isomorphic Queries in Interaction Network Databases

ICTAI '09 Proceedings of the 2009 21st IEEE International Conference on Tools with Artificial Intelligence
Ontology guided autonomous label assignment in wrapper induced tables with missing column names

IRI'09 Proceedings of the 10th IEEE international conference on Information Reuse & Integration
Schema mapping and query translation in heterogeneous P2P XML databases

The VLDB Journal — The International Journal on Very Large Data Bases
A stochastic approach to candidate disease gene subnetwork extraction

Proceedings of the 2010 ACM Symposium on Applied Computing
Wikipedia driven autonomous label assignment in wrapper induced tables with missing column names

Proceedings of the 2010 ACM Symposium on Applied Computing
Computing subgraph isomorphic queries using structural unification and minimum graph structures

Proceedings of the 2011 ACM Symposium on Applied Computing
Information aggregation using the caméléon# web wrapper

EC-Web'05 Proceedings of the 6th international conference on E-Commerce and Web Technologies
Query transformation of SQL into XQuery within federated environments

EDBT'06 Proceedings of the 2006 international conference on Current Trends in Database Technology
WSM: a novel algorithm for subgraph matching in large weighted graphs

Journal of Intelligent Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Researchers in Systems Biology routinely access vast collection of hidden web research resources freely available on the internet. These collections include online data repositories, online and downloadable data analysis tools, publications, text mining systems, visualization artifacts, etc. Almost always, these resources have complex data formats that are heterogeneous in representation, data type, interpretation and even identity. They are often forced to develop analysis pipelines and data management applications that involve extensive and prohibitive manual interactions. Such approaches act as a barrier for optimal use of these resources and thus impede the progress of research. In this paper, we discuss our experience of building a new middleware approach to data and application integration for Systems Biology that leverages recent developments in schema matching, wrapper generation, workflow management, and query language design. In this approach, ad hoc integration of arbitrary resources and computational pipeline construction using a declarative language is advocated. We highlight the features and advantages of this new data management system, called LifeDB, and its query language BioFlow. Based on our experience, we highlight the new challenges it raises, and potential solutions to meet these new research issues toward a viable platform for large scale autonomous data integration. We believe the research issues we raise have general interest in the autonomous data integration community and will be applicable equally to research unrelated to LifeDB.