Reducing redundancy in characteristic rule discovery by using integer programming techniques

Authors:
Tom Brijs;Koen Vanhoof;Geert Wets
Affiliations:
(Correspd. Tel.: +32 11 268621/ http://hyper.luc.ac.be) Department of Applied Economics, Limburg University Centre, B-3590 Diepenbeek, Belgium. E-mail: {tom.brijs, koen.vanhoof, geert.wets}@luc.ac ...;Department of Applied Economics, Limburg University Centre, B-3590 Diepenbeek, Belgium. E-mail: {tom.brijs, koen.vanhoof, geert.wets}@luc.ac.be;Department of Applied Economics, Limburg University Centre, B-3590 Diepenbeek, Belgium. E-mail: {tom.brijs, koen.vanhoof, geert.wets}@luc.ac.be
Venue:
Intelligent Data Analysis
Year:
2000

Citing 12
Cited 7

Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Finding interesting rules from large sets of discovered association rules

CIKM '94 Proceedings of the third international conference on Information and knowledge management
An overview of data warehousing and OLAP technology

ACM SIGMOD Record
From data mining to knowledge discovery: an overview

Advances in knowledge discovery and data mining
Applying rule-based anomalies to KADS inference structures

Decision Support Systems - Eighth workshop on the validation and verification of knowledge-based systems
Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals

Data Mining and Knowledge Discovery
Data-Driven Discovery of Quantitative Rules in Relational Databases

IEEE Transactions on Knowledge and Data Engineering
Methods and Problems in Data Mining

ICDT '97 Proceedings of the 6th International Conference on Database Theory
Representative Association Rules and Minimum Condition Maximum Consequence Association Rules

PKDD '98 Proceedings of the Second European Symposium on Principles of Data Mining and Knowledge Discovery
A Metric for Selection of the Most Promising Rules

PKDD '98 Proceedings of the Second European Symposium on Principles of Data Mining and Knowledge Discovery
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Applying Data Mining Techniques to a Health Insurance Information System

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases

A data mining framework for optimal product selection in retail supermarket data: the generalized PROFSET model

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Pruning Redundant Association Rules Using Maximum Entropy Principle

PAKDD '02 Proceedings of the 6th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Building an Association Rules Framework to Improve Product Assortment Decisions

Data Mining and Knowledge Discovery
Finding association rules that trade support optimally against confidence

Intelligent Data Analysis
An approach for incorporating quality-based cost---benefit analysis in data warehouse design

Information Systems Frontiers
Post-processing of associative classification rules using closed sets

Expert Systems with Applications: An International Journal
A fast pruning redundant rule method using Galois connection

Applied Soft Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

The discovery of characteristic rules is a well-known data mining task and has lead to several successful applications. However, because of the descriptive nature of characteristic rules, typically a (very) large number of them is discovered during the mining stage. This makes monitoring and control of these rules, in practice, extremely costly and difficult. Therefore, a selection of the most promising subset of rules is desirable. Some heuristic rule selection methods have been proposed in the literature that deal with this issue. In this paper, we propose an integer programming model to solve the problem of optimally selecting the most promising subset of characteristic rules. Moreover, the proposed technique enables to control a user-defined level of overall quality of the model in combination with a maximum reduction of the redundancy extant in the original ruleset. We use real-world data to empirically evaluate the benefits and performance of the proposed technique against the well-known RuleCover heuristic. Results demonstrate that the proposed integer programming techniques are able to significantly reduce the number of retained rules and the level of redundancy in the final ruleset. Moreover, the results demonstrate that the overall quality in terms of the discriminant power of the final ruleset slightly increases if integer programming methods are used.