Classification with feature selection via mathematical programming

  • Authors:
  • Stanislav Busygin;Panos M. Pardalos

  • Affiliations:
  • Department of Industrial and Systems Engineering, University of Florida, Gainesville, FL;Department of Industrial and Systems Engineering, University of Florida, Gainesville, FL

  • Venue:
  • ICCOMP'05 Proceedings of the 9th WSEAS International Conference on Computers
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Let a set of training and test samples be given, and the samples from the training set be partitioned into a number of classes, while classification of the test samples is unknown. The classification problem consists in determining classes of the test samples utilizing the information provided by the training set. Usually, not all features of the data set are informative for discovering the classification, and a subset of features relevant to it should be found. This task is called the feature selection. We handle it from the viewpoint of mathematical programming in the following way. We consider several unsupervised clustering principles and use them as constraints, while representing the desirable properties of feature selection as the objective function. In particular, we consider k-means local optimality constraints, pairwise threshold constraints, and biclustering consistency constraints. The involved objectives are used either to maximize separation of classes or to minimize the information loss. The developed optimization-based approach has shown good performance on well-known DNA microarray data sets.