Mathematical Programming in Data Mining

  • Authors:
  • O. L. Mangasarian

  • Affiliations:
  • Computer Sciences Department University of Wisconsin Madison, WI 53706. E-mail: olvi@cs.wisc.edu

  • Venue:
  • Data Mining and Knowledge Discovery
  • Year:
  • 1997

Quantified Score

Hi-index 0.00

Visualization

Abstract

Mathematical programming approaches to three fundamental problemswill be described: feature selection, clustering and robustrepresentation. The feature selection problem considered is that ofdiscriminating between two sets while recognizing irrelevant andredundant features and suppressing them. This creates a lean modelthat often generalizes better to new unseen data. Computationalresults on real data confirm improved generalization of leanermodels. Clustering is exemplified by the unsupervised learning ofpatterns and clusters that may exist in a given database and is auseful tool for knowledge discovery in databases (KDD). Amathematical programming formulation of this problem is proposed thatis theoretically justifiable and computationally implementable in afinite number of steps. A resulting k-Median Algorithm is utilized todiscover very useful survival curves for breast cancer patients froma medical database. Robust representation is concerned withminimizing trained model degradation when applied to new problems. Anovel approach is proposed that purposely tolerates a small error inthe training process in order to avoid overfitting data that maycontain errors. Examples of applications of these concepts aregiven.