QDex: a database profiler for generic bio-data exploration and quality aware integration

  • Authors:
  • F. Moussouni;L. Berti-Équille;G. Rozé;O. Loréal;E. Guérin

  • Affiliations:
  • INSERM, CHU Pontchaillou, Rennes, France;IRISA, Campus Universitaire de Beaulieu, Rennes, France;INSERM, CHU Pontchaillou, Rennes, France;INSERM, CHU Pontchaillou, Rennes, France;INSERM, CHU Pontchaillou, Rennes, France

  • Venue:
  • WISE'07 Proceedings of the 2007 international conference on Web information systems engineering
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

In human health and life sciences, researchers extensively collaborate with each other, sharing genomic, biomedical and experimental results. This necessitates dynamically integrating different databases into a single repository or a warehouse. The data integrated in these warehouses are extracted from various heterogeneous sources, having different degrees of quality and trust. Most of the time, they are neither rigorously chosen nor carefully controlled for data quality. Data preparation and data quality metadata are recommended but still insufficiently exploited for ensuring quality and validating the results of information retrieval or data mining techniques. In a previous work, we built a data warehouse called GEDAW (Gene Expression Data Warehouse) that stores various information: data on genes expressed in the liver during iron overload and liver diseases, relevant information from public databanks (mostly in XML), DNA-chips home experiments and also medical records. Based on our past experience, this paper reports briefly on the lessons learned from biomedical data integration and data quality issues, and the solutions we propose to the numerous problems of schema evolution of both data sources and warehousing system. In this context, we present QDex, a Quality driven bio-Data Exploration tool, which provides a functional and modular architecture for database profiling and exploration, enabling users to set up query workflows and take advantage of data quality profiling metadata before the complex processes of data integration in the warehouse. An illustration with QDex Tool is shown afterwards.