An evaluation of the use of XML for representation, querying, and analysis of molecular interactions

  • Authors:
  • Lena Strömbäck;David Hall

  • Affiliations:
  • Department of Computer and Information Science, Linköping University, Sweden;Department of Computer and Information Science, Linköping University, Sweden

  • Venue:
  • EDBT'06 Proceedings of the 2006 international conference on Current Trends in Database Technology
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Currently, biology researchers rapidly generate new information on how genes, proteins and other molecules interact in living organisms. To completely understand the machinery underlying life it is necessary to integrate and analyze these large quantities of data. As one step in this direction, new standards for describing molecular interactions have been defined based on XML. This work evaluates the usage of the XML Query language XQuery for molecular interactions, as it would be of great benefit to the user to work directly on data represented in the new standards. We use and compare a set of available XQuery implementations, eXist, X-Hive, Sedna and QizX/open for querying and analysis on data exported from available databases. Our conclusion is that XQuery can easily be used for the most common queries in this domain but is not feasible for more complex analyses. In particular, for queries containing path analysis the available XQuery implementations have poor performance and an extension of the GTL package clearly outperforms XQuery. The paper ends with a discussion regarding the usability of XQuery in this domain. In particular we point out the need for more efficient graph handling and that XQuery also requires the user to understand the exact XML format of each dataset.