OntoDataClean: ontology-based integration and preprocessing of distributed data

  • Authors:
  • David Perez-Rey;Alberto Anguita;Jose Crespo

  • Affiliations:
  • Biomedical Informatics Group, Artificial Intelligence Laboratory, School of Computer Science, Universidad Politécnica de Madrid, Boadilla del Monte, Madrid;Biomedical Informatics Group, Artificial Intelligence Laboratory, School of Computer Science, Universidad Politécnica de Madrid, Boadilla del Monte, Madrid;Biomedical Informatics Group, Artificial Intelligence Laboratory, School of Computer Science, Universidad Politécnica de Madrid, Boadilla del Monte, Madrid

  • Venue:
  • ISBMDA'06 Proceedings of the 7th international conference on Biological and Medical Data Analysis
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Within the knowledge discovery in databases (KDD) process, previous phases to data mining consume most of the time spent analysing data. Few research efforts have been carried out in theses steps compared to data mining, suggesting that new approaches and tools are needed to support the preparation of data. As regards, we present in this paper a new methodology of ontology-based KDD adopting a federated approach to database integration and retrieval. Within this model, an ontology-based system called OntoDataClean has been developed dealing with instance-level integration and data preprocessing. Within the OntoDataClean development, a preprocessing ontology was built to store the information about the required transformations. Various biomedical experiments were carried out, showing that data have been correctly transformed using the preprocessing ontology. Although OntoDataClean does not cover every possible data transformation, it suggests that ontologies are a suitable mechanism to improve quality in the various steps of KDD processes.