Construct robust rule sets for classification

Authors:
Jiuyong Li;Rodney Topor;Hong Shen
Affiliations:
The University of Southern, Queensland, Australia;Griffith University, Australia;Japan Advanced Institute of Science and Technology, Japan, 923-1292
Venue:
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2002

Citing 12
Cited 7

Rule induction with CN2: some recent improvements

EWSL-91 Proceedings of the European working session on learning on Machine learning
C4.5: programs for machine learning

C4.5: programs for machine learning
Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Mathematical methods in artificial intelligence

Mathematical methods in artificial intelligence
Bagging predictors

Machine Learning
A decision-theoretic generalization of on-line learning and an application to boosting

Journal of Computer and System Sciences - Special issue: 26th annual ACM symposium on the theory of computing & STOC'94, May 23–25, 1994, and second annual Europe an conference on computational learning theory (EuroCOLT'95), March 13–15, 1995
Extending naïve Bayes classifiers using long itemsets

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining association rules with multiple minimum supports

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Machine Learning

Machine Learning
An Empirical Comparison of Selection Measures for Decision-Tree Induction

Machine Learning
CMAR: Accurate and Efficient Classification Based on Multiple Class-Association Rules

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases

A critical review of multi-objective optimization in data mining: a position paper

ACM SIGKDD Explorations Newsletter
Using association rules to make rule-based classifiers robust

ADC '05 Proceedings of the 16th Australasian database conference - Volume 39
Robust Rule-Based Prediction

IEEE Transactions on Knowledge and Data Engineering
A review of associative classification mining

The Knowledge Engineering Review
CIMDS: adapting postprocessing techniques of associative classification for malware detection

IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews
A clustering rule-based approach to predictive modeling

Proceedings of the 48th Annual Southeast Regional Conference
A Clustering Rule Based Approach for Classification Problems

International Journal of Data Warehousing and Mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

We study the problem of computing classification rule sets from relational databases so that accurate predictions can be made on test data with missing attribute values. Traditional classifiers perform badly when test data are not as complete as the training data because they tailor a training database too much. We introduce the concept of one rule set being more robust than another, that is, able to make more accurate predictions on test data with missing attribute values. We show that the optimal class association rule set is as robust as the complete class association rule set. We then introduce the k-optimal rule set, which provides predictions exactly the same as the optimal class association rule set on test data with up to k missing attribute values. This leads to a hierarchy of k-optimal rule sets in which decreasing size corresponds to decreasing robustness, and they all more robust than a traditional classification rule set. We introduce two methods to find k-optimal rule sets, i.e. an optimal association rule mining approach and a heuristic approximate approach. We show experimentally that a k-optimal rule set generated by the optimal association rule mining approach performs better than that by the heuristic approximate approach and both rule sets perform significantly better than a typical classification rule set (C4.5Rules) on incomplete test data.