Exact and approximate discrete optimization algorithms for finding useful disjunctions of categorical predicates in data analysis

Authors:
Endre Boros;Vladimir Menkov
Affiliations:
RUTCOR, Rutgers University, 640 Bartholomew Road, Piscataway, NJ;Aqsaqal Enterprises, Penticton, British Columbia, Canada
Venue:
Discrete Applied Mathematics - Discrete mathematics & data mining (DM & DM)
Year:
2004

Citing 10
Cited 0

Occam's razor

Information Processing Letters
C4.5: programs for machine learning

C4.5: programs for machine learning
Logical analysis of numerical data

Mathematical Programming: Series A and B - Special issue: papers from ismp97, the 16th international symposium on mathematical programming, Lausanne EPFL
Selection of relevant features and examples in machine learning

Artificial Intelligence - Special issue on relevance
A Formalism for Relevance and Its Application in Feature Subset Selection

Machine Learning
An Implementation of Logical Analysis of Data

IEEE Transactions on Knowledge and Data Engineering
Induction of Decision Trees

Machine Learning
Feature Selection Via Mathematical Programming

INFORMS Journal on Computing
Pseudo-boolean optimization

Discrete Applied Mathematics
Finding Essential Attributes from Binary Data

Annals of Mathematics and Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

We discuss a discrete optimization problem that arises in data analysis from the binarization of categorical attributes. It can be described as the maximization of a function F(l1(x), l2(x)), where l1(x) and l2(x) are linear functions of binary variables x ∈ {0,1}n, and F : R2 → R. Though this problem is NP-hard, in general, an optimal solution x* of it can be found, under some mild monotonicity conditions on F, in pseudo-polynomial time. We also present an approximation algorithm which finds an approximate binary solution xε, for any given ε 0, such that F(l1 (x*), l2(x*)) - F(l1 (xε), l2(xε)) n log n + 2C/√εn) operations. Though in general C depends on the problem instance, for the problems arising from [en]binarization of categorical variables it depends only on F, and for all functions considered we have C ≤ 1/√2.