Quality assessment of MAGE-ML genomic datasets using DescribeX

  • Authors:
  • Lorena Etcheverry;Shahan Khatchadourian;Mariano Consens

  • Affiliations:
  • Instituto de Computación, Facultad de Ingeniería, Universidad de la República;University of Toronto;University of Toronto

  • Venue:
  • DILS'10 Proceedings of the 7th international conference on Data integration in the life sciences
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

The functional genomics and informatics community has made extensive microarray experimental data available online, facilitating independent evaluation of experiment conclusions and enabling researchers to access and reuse a growing body of gene expression knowledge. While there are several data-exchange standards, numerous microarray experiment datasets are published using the MAGE-ML XML schema. Assessing the quality of published experiments is a challenging task, and there is no consensus among microarray users on a framework to measure dataset quality. In this paper, we develop techniques based on DescribeX (a summary-based visualization tool for XML) that quantitatively and qualitatively analyze MAGE-ML public collections, gaining insights about schema usage. We address specific questions such as detection of common instance patterns and coverage, precision of the experiment descriptions, and usage of controlled vocabularies. Our case study shows that DescribeX is a useful tool for the evaluation of microarray experiment data quality that enhances the understanding of the instance-level structure of MAGE-ML datasets.