Using semantic web tools to integrate experimental measurement data on our own terms

  • Authors:
  • M. Scott Marshall;Lennart Post;Marco Roos;Timo M. Breit

  • Affiliations:
  • Integrative Bioinformatics Unit;Integrative Bioinformatics Unit;Integrative Bioinformatics Unit;Integrative Bioinformatics Unit

  • Venue:
  • OTM'06 Proceedings of the 2006 international conference on On the Move to Meaningful Internet Systems: AWeSOMe, CAMS, COMINF, IS, KSinBIT, MIOS-CIAO, MONET - Volume Part I
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

The -omics data revolution, galvanized by the development of the web, has resulted in large numbers of valuable public databases and repositories Scientists wishing to employ this data for their research are faced with the question of how to approach data integration Ad hoc solutions can result in diminished generality, interoperability, and reusability, as well as loss of data provenance One of the promising notions that the Semantic Web brings to the life sciences is that experimental data can be described with relevant life science terms and concepts Subsequent integration and analysis can then take advantage of those terms, exposing logic that might otherwise only be available from the interpretation of program code In the context of a biological use case, we examine a general semantic web approach to integrating experimental measurement data with Semantic Web tools such as Protégé and Sesame The approach to data integration that we define is based on the linking of data with OWL classes The general pattern that we apply consists of 1) building application-specific ontologies for “myModel” 2) identifying the concepts involved in the biological hypothesis, 3) finding data instances of the concepts, 4) finding a common domain to be used for integration, and 5) integrating the data Our experience with current tools indicates a few semantic web bottlenecks such as a general lack of ‘semantic disclosure' from public data resources and the need for better ‘interval join' performance from RDF query engines.