Exact and approximate discrete optimization algorithms for finding useful disjunctions of categorical predicates in data analysis

  • Authors:
  • Endre Boros;Vladimir Menkov

  • Affiliations:
  • RUTCOR, Rutgers University, 640 Bartholomew Road, Piscataway, NJ;Aqsaqal Enterprises, Penticton, British Columbia, Canada

  • Venue:
  • Discrete Applied Mathematics - Discrete mathematics & data mining (DM & DM)
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

We discuss a discrete optimization problem that arises in data analysis from the binarization of categorical attributes. It can be described as the maximization of a function F(l1(x), l2(x)), where l1(x) and l2(x) are linear functions of binary variables x ∈ {0,1}n, and F : R2 → R. Though this problem is NP-hard, in general, an optimal solution x* of it can be found, under some mild monotonicity conditions on F, in pseudo-polynomial time. We also present an approximation algorithm which finds an approximate binary solution xε, for any given ε 0, such that F(l1 (x*), l2(x*)) - F(l1 (xε), l2(xε)) n log n + 2C/√εn) operations. Though in general C depends on the problem instance, for the problems arising from [en]binarization of categorical variables it depends only on F, and for all functions considered we have C ≤ 1/√2.