Analogy-based reasoning in classifier construction

  • Authors:
  • Arkadiusz Wojna

  • Affiliations:
  • Institute of Informatics, Warsaw University, Warsaw, Poland

  • Venue:
  • Transactions on Rough Sets IV
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Analogy-based reasoning methods in machine learning make it possible to reason about properties of objects on the basis of similarities between objects. A specific similarity based method is the k nearest neighbors (k-nn) classification algorithm. In the k-nn algorithm, a decision about a new object x is inferred on the basis of a fixed number k of the objects most similar to x in a given set of examples. The primary contribution of the dissertation is the introduction of two new classification models based on the k-nn algorithm. The first model is a hybrid combination of the k-nn algorithm with rule induction. The proposed combination uses minimal consistent rules defined by local reducts of a set of examples. To make this combination possible the model of minimal consistent rules is generalized to a metric-dependent form. An effective polynomial algorithm implementing the classification model based on minimal consistent rules has been proposed by Bazan. We modify this algorithm in such a way that after addition of the modified algorithm to the k-nn algorithm the increase of the computation time is inconsiderable. For some tested classification problems the combined model was significantly more accurate than the classical k-nn classification algorithm. For many real-life problems it is impossible to induce relevant global mathematical models from available sets of examples. The second model proposed in the dissertation is a method for dealing with such sets based on locally induced metrics. This method adapts the notion of similarity to the properties of a given test object. It makes it possible to select the correct decision in specific fragments of the space of objects. The method with local metrics improved significantly the classification accuracy of methods with global models in the hardest tested problems. The important issues of quality and efficiency of the k-nn based methods are a similarity measure and the performance time in searching for the most similar objects in a given set of examples, respectively. In this dissertation both issues are studied in detail and some significant improvements are proposed for the similarity measures and for the search methods found in the literature.