On the Selection of the Globally Optimal Prototype Subset for Nearest-Neighbor Classification

  • Authors:
  • Emilio Carrizosa;Belén Martín-Barragán;Frank Plastria;Dolores Romero Morales

  • Affiliations:
  • Facultad de Matemáticas, Universidad de Sevilla, 41012 Sevilla, Spain;Departamento de Estadística, Universidad Carlos III de Madrid, 28903 Getafe, Madrid, Spain;MOSI--Department of Mathematics, Operational Research, Statistics and Information Systems for Management, Vrije Universiteit Brussel, B-1050 Brussel, Belgium;Saïd Business School, University of Oxford, Oxford OX1 1HP, United Kingdom

  • Venue:
  • INFORMS Journal on Computing
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

The nearest-neighbor classifier has been shown to be a powerful tool for multiclass classification. We explore both theoretical properties and empirical behavior of a variant method, in which the nearest-neighbor rule is applied to a reduced set of prototypes. This set is selected a priori by fixing its cardinality and minimizing the empirical misclassification cost. In this way we alleviate the two serious drawbacks of the nearest-neighbor method: high storage requirements and time-consuming queries. Finding this reduced set is shown to be NP-hard. We provide mixed integer programming (MIP) formulations, which are theoretically compared and solved by a standard MIP solver for small problem instances. We show that the classifiers derived from these formulations are comparable to benchmark procedures. We solve large problem instances by a metaheuristic that yields good classification rules in reasonable time. Additional experiments indicate that prototype-based nearest-neighbor classifiers remain quite stable in the presence of missing values.