Highly scalable and robust rule learner: performance evaluation and comparison

Authors:
L. A. Kurgan;K. J. Cios;S. Dick
Affiliations:
Dept. of Electr. & Comput. Eng., Univ. of Alberta, Edmonton, Alta., Canada;-;-
Venue:
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Year:
2006

Citing 0
Cited 6

Prediction of Protein Functions from Protein Interaction Networks: A Naïve Bayes Approach

PRICAI '08 Proceedings of the 10th Pacific Rim International Conference on Artificial Intelligence: Trends in Artificial Intelligence
Review:

The Knowledge Engineering Review
A fuzzy-rough sets based compact rule induction method for classifying hybrid data

RSKT'12 Proceedings of the 7th international conference on Rough Sets and Knowledge Technology
Feature Based Rule Learner in Noisy Environment Using Neighbourhood Rough Set Model

International Journal of Software Science and Computational Intelligence
Multi model transfer learning with RULES family

MLDM'13 Proceedings of the 9th international conference on Machine Learning and Data Mining in Pattern Recognition
Preceding rule induction with instance reduction methods

MLDM'13 Proceedings of the 9th international conference on Machine Learning and Data Mining in Pattern Recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

Business intelligence and bioinformatics applications increasingly require the mining of datasets consisting of millions of data points, or crafting real-time enterprise-level decision support systems for large corporations and drug companies. In all cases, there needs to be an underlying data mining system, and this mining system must be highly scalable. To this end, we describe a new rule learner called DataSqueezer. The learner belongs to the family of inductive supervised rule extraction algorithms. DataSqueezer is a simple, greedy, rule builder that generates a set of production rules from labeled input data. In spite of its relative simplicity, DataSqueezer is a very effective learner. The rules generated by the algorithm are compact, comprehensible, and have accuracy comparable to rules generated by other state-of-the-art rule extraction algorithms. The main advantages of DataSqueezer are very high efficiency, and missing data resistance. DataSqueezer exhibits log-linear asymptotic complexity with the number of training examples, and it is faster than other state-of-the-art rule learners. The learner is also robust to large quantities of missing data, as verified by extensive experimental comparison with the other learners. DataSqueezer is thus well suited to modern data mining and business intelligence tasks, which commonly involve huge datasets with a large fraction of missing data.