Attribute reduction of data with error ranges and test costs

  • Authors:
  • Fan Min;William Zhu

  • Affiliations:
  • Lab of Granular Computing, Zhangzhou Normal University, Zhangzhou 363000, China;Lab of Granular Computing, Zhangzhou Normal University, Zhangzhou 363000, China

  • Venue:
  • Information Sciences: an International Journal
  • Year:
  • 2012

Quantified Score

Hi-index 0.07

Visualization

Abstract

In data mining applications, we have a number of measurement methods to obtain a data item with different test costs and different error ranges. Test costs refer to time, money, or other resources spent in obtaining data items related to some object; observational errors correspond to differences in measured and true value of a data item. In supervised learning, we need to decide which data items to obtain and which measurement methods to employ, so as to minimize the total test cost and help in constructing classifiers. This paper studies this problem in four steps. First, data models are built to address error ranges and test costs. Second, error-range-based covering rough set is constructed to define lower and upper approximations, positive regions, and relative reducts. A closely related theory deals with neighborhood rough set, which has been successfully applied to heterogeneous attribute reduction. The major difference between the two theories is the definition of neighborhood. Third, the minimal test cost attribute reduction problem is redefined in the new theory. Fourth, both backtrack and heuristic algorithms are proposed to deal with the new problem. The algorithms are tested on ten UCI (University of California - Irvine) datasets. Experimental results show that the backtrack algorithm is efficient on rational-sized datasets, the weighting mechanism for the heuristic information is effective, and the competition approach can improve the quality of the result significantly. This study suggests new research trends concerning attribute reduction and covering rough set.