A nonparametric classification method based on K-associated graphs

  • Authors:
  • João Roberto Bertini, Jr.;Liang Zhao;Robson Motta;Alneu de Andrade Lopes

  • Affiliations:
  • Institute of Mathematics and Computer Science, University of São Paulo, Av. Trabalhador São-Carlense 400, São Carlos, SP 13560-970, Brazil;Institute of Mathematics and Computer Science, University of São Paulo, Av. Trabalhador São-Carlense 400, São Carlos, SP 13560-970, Brazil;Institute of Mathematics and Computer Science, University of São Paulo, Av. Trabalhador São-Carlense 400, São Carlos, SP 13560-970, Brazil;Institute of Mathematics and Computer Science, University of São Paulo, Av. Trabalhador São-Carlense 400, São Carlos, SP 13560-970, Brazil

  • Venue:
  • Information Sciences: an International Journal
  • Year:
  • 2011

Quantified Score

Hi-index 0.07

Visualization

Abstract

Graph is a powerful representation formalism that has been widely employed in machine learning and data mining. In this paper, we present a graph-based classification method, consisting of the construction of a special graph referred to as K-associated graph, which is capable of representing similarity relationships among data cases and proportion of classes overlapping. The main properties of the K-associated graphs as well as the classification algorithm are described. Experimental evaluation indicates that the proposed technique captures topological structure of the training data and leads to good results on classification task particularly for noisy data. In comparison to other well-known classification techniques, the proposed approach shows the following interesting features: (1) A new measure, called purity, is introduced not only to characterize the degree of overlap among classes in the input data set, but also to construct the K-associated optimal graph for classification; (2) nonlinear classification with automatic local adaptation according to the input data. Contrasting to K-nearest neighbor classifier, which uses a fixed K, the proposed algorithm is able to automatically consider different values of K, in order to best fit the corresponding overlap of classes in different data subspaces, revealing both the local and global structure of input data. (3) The proposed classification algorithm is nonparametric, implicating high efficiency and no need for model selection in practical applications.