Mining from incomplete quantitative data by fuzzy rough sets

  • Authors:
  • Tzung-Pei Hong;Li-Huei Tseng;Been-Chian Chien

  • Affiliations:
  • Department of Computer Science and Information Engineering, National University of Kaohsiung, Kaohsiung 811, Taiwan, ROC;Institute of Computer Science and Information Engineering, I-Shou University, Kaohsiung 840, Taiwan, ROC;Department of Computer Science and Information Engineering, National University of Tainan, Tainan 700, Taiwan, ROC

  • Venue:
  • Expert Systems with Applications: An International Journal
  • Year:
  • 2010

Quantified Score

Hi-index 12.05

Visualization

Abstract

Machine learning can extract desired knowledge from existing training examples and ease the development bottleneck in building expert systems. Most learning approaches derive rules from complete data sets. If some attribute values are unknown in a data set, it is called incomplete. Learning from incomplete data sets is usually more difficult than learning from complete data sets. In the past, the rough-set theory was widely used in dealing with data classification problems. Most conventional mining algorithms based on the rough-set theory identify relationships among data using crisp attribute values. Data with quantitative values, however, are commonly seen in real-world applications. In this paper, we thus deal with the problem of learning from incomplete quantitative data sets based on rough sets. A learning algorithm is proposed, which can simultaneously derive certain and possible fuzzy rules from incomplete quantitative data sets and estimate the missing values in the learning process. Quantitative values are first transformed into fuzzy sets of linguistic terms using membership functions. Unknown attribute values are then assumed to be any possible linguistic terms and are gradually refined according to the fuzzy incomplete lower and upper approximations derived from the given quantitative training examples. The examples and the approximations then interact on each other to derive certain and possible rules and to estimate appropriate unknown values. The rules derived can then serve as knowledge concerning the incomplete quantitative data set.