Integrating bioinformatic data sources over the SFSU ER design tools XML databus

  • Authors:
  • Yan Liu;Sorna Vincent;Marguerite C. Murphy

  • Affiliations:
  • San Francisco State University, San Francisco, CA;San Francisco State University, San Francisco, CA;San Francisco State University, San Francisco, CA

  • Venue:
  • ICWE '06 Workshop proceedings of the sixth international conference on Web engineering
  • Year:
  • 2006

Quantified Score

Hi-index 0.01

Visualization

Abstract

The SFSU ER Design Tools were developed to support database design and data integration over multiple implementation data models. These tools allow users to enter and view Entity Relationship (ER) schemas and to translate ER schemas into a variety of equivalent implementation schemas, including Relational (ANSI SQL2), Object Oriented (ODMG 3.0), Spreadsheet (Universal Relation with associated functional dependencies) and W3C XML DTD. In addition, for each implementation data model, the Tools generate DDL statements to create a database, as well as simple JDBC/ODBC based code to dump stored data into an XML file and to load data from an XML file into a database. Data can be transferred from one data store to another over an HTTP based XML Databus. In this paper we describe the design and implementation of our XML Databus using Web Services, as well as a new strategy to support integration of bioinformatics data sets. We first manually identify semantically equivalent attributes in both schemas, then automatically join the corresponding data sets into a single integrated collection of XML formatted data. Our software is operational, and preliminary performance measurements over DTD and data downloaded from the NIH-NCBI Web site show that our strategy is feasible for moderately sized data sets.