Case-based reasoning for classification in the mixed data sets employing the compound distance methods

  • Authors:
  • Mohammad Taghi Rezvan;Ali Zeinal Hamadani;Ali Shalbafzadeh

  • Affiliations:
  • -;-;-

  • Venue:
  • Engineering Applications of Artificial Intelligence
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Development of classification methods using case-based reasoning systems is an active area of research. In this paper, two new case-based reasoning systems with two similarity measures that support mixed categorical and numerical data as well as only categorical data are proposed. The principal difference between these two measures lies in the calculations of distance for categorical data. The first one, named distance in unsupervised learning (DUL), is derived from co-occurrence of values, and the other one, named distance in supervised learning (DSL), is used to calculate the distance between two values of the same feature with respect to every other feature for a given class. However, the distance between numerical data is computed using the Euclidean distance. Furthermore, the importance of numeric features is determined by linear discrimination analysis (LDA) and the weight assignment to categorical features depends on co-occurrence of feature values when calculating the similarity between a new case and the old one. The performance of the proposed case-based reasoning systems has been investigated on the University of California, Irvine (UCI) data sets by 5-fold cross validation. The results indicate that these case-based reasoning systems will produce a proper performance in predictive accuracy and interpretability.