OntoDataClean: ontology-based integration and preprocessing of distributed data

Authors:
David Perez-Rey;Alberto Anguita;Jose Crespo
Affiliations:
Biomedical Informatics Group, Artificial Intelligence Laboratory, School of Computer Science, Universidad Politécnica de Madrid, Boadilla del Monte, Madrid;Biomedical Informatics Group, Artificial Intelligence Laboratory, School of Computer Science, Universidad Politécnica de Madrid, Boadilla del Monte, Madrid;Biomedical Informatics Group, Artificial Intelligence Laboratory, School of Computer Science, Universidad Politécnica de Madrid, Boadilla del Monte, Madrid
Venue:
ISBMDA'06 Proceedings of the 7th international conference on Biological and Medical Data Analysis
Year:
2006

Citing 10
Cited 3

A translation approach to portable ontology specifications

Knowledge Acquisition - Special issue: Current issues in knowledge modeling
Predictive data mining: a practical guide

Predictive data mining: a practical guide
AJAX: an extensible data cleaning tool

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Ontology-guided knowledge discovery in databases

Proceedings of the 1st international conference on Knowledge capture
Heterogeneous database integration in biomedicine

Computers and Biomedical Research
Ontology-Based Data Cleaning

NLDB '02 Proceedings of the 6th International Conference on Applications of Natural Language to Information Systems-Revised Papers
Potter's Wheel: An Interactive Data Cleaning System

Proceedings of the 27th International Conference on Very Large Data Bases
Exploratory Data Mining and Data Cleaning

Exploratory Data Mining and Data Cleaning
Toward Intelligent Assistance for a Data Mining Process: An Ontology-Based Approach for Cost-Sensitive Classification

IEEE Transactions on Knowledge and Data Engineering
Swoop: A Web Ontology Editing Browser

Web Semantics: Science, Services and Agents on the World Wide Web

An Ontology-Based Method to Link Database Integration and Data Mining within a Biomedical Distributed KDD

AIME '09 Proceedings of the 12th Conference on Artificial Intelligence in Medicine: Artificial Intelligence in Medicine
The ACGT Master Ontology and its applications - Towards an ontology-driven cancer research and management system

Journal of Biomedical Informatics
Toward intelligent data warehouse mining: An ontology-integrated approach for multi-dimensional association mining

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

Within the knowledge discovery in databases (KDD) process, previous phases to data mining consume most of the time spent analysing data. Few research efforts have been carried out in theses steps compared to data mining, suggesting that new approaches and tools are needed to support the preparation of data. As regards, we present in this paper a new methodology of ontology-based KDD adopting a federated approach to database integration and retrieval. Within this model, an ontology-based system called OntoDataClean has been developed dealing with instance-level integration and data preprocessing. Within the OntoDataClean development, a preprocessing ontology was built to store the information about the required transformations. Various biomedical experiments were carried out, showing that data have been correctly transformed using the preprocessing ontology. Although OntoDataClean does not cover every possible data transformation, it suggests that ontologies are a suitable mechanism to improve quality in the various steps of KDD processes.