A set of experiments to consider data quality criteria in classification techniques for data mining

  • Authors:
  • Roberto Espinosa;José Zubcoff;Jose-Norberto Mazón

  • Affiliations:
  • University of Matanzas, Cuba;Dept. of Sea Sciences and Applied Biology, University of Alicante, Spain;Dept. of Software and Computing Systems, University of Alicante, Spain

  • Venue:
  • ICCSA'11 Proceedings of the 2011 international conference on Computational science and its applications - Volume Part II
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

A successful data mining process depends on the data quality of the sources in order to obtain reliable knowledge. Therefore, preprocessing data is required for dealing with data quality criteria. However, preprocessing data has been traditionally seen as a time-consuming and non-trivial task since data quality criteria have to be considered without any guide about how they affect the data mining process. To overcome this situation, in this paper, we propose to analyze the data mining techniques to know the behavior of different data quality criteria on the sources and how they affects the results of the algorithms. To this aim, we have conducted a set of experiments to assess three data quality criteria: completeness, correlation and balance of data. This work is a first step towards considering, in a systematic and structured manner, data quality criteria for supporting and guiding data miners in obtaining reliable knowledge.