Approximation of Optimal Two-Dimensional Association Rules for Categorical Attributes Using Semidefinite Programming

Authors:
Katsuki Fujisawa;Yukinobu Hamuro;Naoki Katoh;Takeshi Tokuyama;Katsutoshi Yada
Affiliations:
-;-;-;-;-
Venue:
DS '99 Proceedings of the Second International Conference on Discovery Science
Year:
1999

Citing 19
Cited 2

The NP-completeness column: An ongoing guide

Journal of Algorithms
Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Polynomial time approximation schemes for dense instances of NP-hard problems

STOC '95 Proceedings of the twenty-seventh annual ACM symposium on Theory of computing
Data mining using two-dimensional optimized association rules: scheme, algorithms, and visualization

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Exploiting sparsity in primal-dual interior-point methods for semidefinite programming

Mathematical Programming: Series A and B - Special issue: papers from ismp97, the 16th international symposium on mathematical programming, Lausanne EPFL
On the boosting ability of top-down decision tree learning algorithms

Journal of Computer and System Sciences
Polynomial-time solutions to image segmentation

Proceedings of the seventh annual ACM-SIAM symposium on Discrete algorithms
Computers and Intractability: A Guide to the Theory of NP-Completeness

Computers and Intractability: A Guide to the Theory of NP-Completeness
C4.5: Programs for Machine Learning

C4.5: Programs for Machine Learning
Interior-Point Methods for the Monotone Semidefinite Linear Complementarity Problem in Symmetric Matrices

SIAM Journal on Optimization
Implementation and Evaluation of Decision Trees with Rangeand Region Splitting

Constraints
Mining Pharmacy Data Helps to Make Profits

Data Mining and Knowledge Discovery
Induction of Decision Trees

Machine Learning
Algorithms for Mining Association Rules for Binary Segmentations of Huge Categorical Databases

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Greedily Finding a Dense Subgraph

SWAT '96 Proceedings of the 5th Scandinavian Workshop on Algorithm Theory
Constructing Efficient Decision Trees by Using Optimized Numeric Association Rules

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Finding Dense Subgraphs with Semidefinite Programming

APPROX '98 Proceedings of the International Workshop on Approximation Algorithms for Combinatorial Optimization
On the densest k-subgraph problems

On the densest k-subgraph problems
On choosing a dense subgraph

SFCS '93 Proceedings of the 1993 IEEE 34th Annual Foundations of Computer Science

Discovering Interpretable Rules that Explain Customers' Brand Choice Behavior

DS '00 Proceedings of the Third International Conference on Discovery Science
Data mining oriented CRM systems based on MUSASHI: C-MUSASHI

AM'03 Proceedings of the Second international conference on Active Mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

We consider the problem of finding two-dimensional association rules for categorical attributes. Suppose we have two conditional attributes A and B both of whose domains are categorical, and one binary target attribute whose domain is {"positive", "negative"}. We want to split the Cartesian product of domains of A and B into two subsets so that a certain objective function is optimized, i.e., we want to find a good segmentation of the domains of A and B. We consider in this paper the objective function that maximizes the confidence under the constraint of the upper bound of the support size. We first prove that the problem is NP-hard, and then propose an approximation algorithm based on semidefinite programming. In order to evaluate the effectiveness and efficiency of the proposed algorithm, we carry out computational experiments for problem instances generated by real sales data consisting of attributes whose domain size is a few hundreds at maximum. Approximation ratios of the solutions obtained measured by comparing solutions for semidefinite programming relaxation range from 76% to 95%. It is observed that the performance of generated association rules are significantly superior to that of one-dimensional rules.