BRACID: a comprehensive approach to learning rules from imbalanced data

Authors:
Krystyna Napierala;Jerzy Stefanowski
Affiliations:
Institute of Computing Science, Poznan University of Technology, Poznan, Poland;Institute of Computing Science, Poznan University of Technology, Poznan, Poland
Venue:
Journal of Intelligent Information Systems
Year:
2012

Citing 43
Cited 0

Toward memory-based reasoning

Communications of the ACM - Special issue on parallelism
A Nearest Hyperrectangle Learning Method

Machine Learning
C4.5: programs for machine learning

C4.5: programs for machine learning
Using the m-estimate in rule induction

Journal of Computing and Information Technology
Unifying instance-based and rule-based induction

Machine Learning
Separate-and-Conquer Rule Learning

Artificial Intelligence Review
Mining needle in a haystack: classifying rare classes via two-phase rule induction

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Machine Learning and Data Mining; Methods and Applications

Machine Learning and Data Mining; Methods and Applications
The CN2 Induction Algorithm

Machine Learning
Learning When Negative Examples Abound

ECML '97 Proceedings of the 9th European Conference on Machine Learning
Generating Accurate Rule Sets Without Global Optimization

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
An Analysis of Quantitative Measures Associated with Rules

PAKDD '99 Proceedings of the Third Pacific-Asia Conference on Methodologies for Knowledge Discovery and Data Mining
ELEM2: A Learning System for More Accurate Classifications

AI '98 Proceedings of the 12th Biennial Conference of the Canadian Society for Computational Studies of Intelligence on Advances in Artificial Intelligence
A Case Study for Learning from Imbalanced Data Sets

AI '01 Proceedings of the 14th Biennial Conference of the Canadian Society on Computational Studies of Intelligence: Advances in Artificial Intelligence
Handbook of data mining and knowledge discovery

Handbook of data mining and knowledge discovery
Types and forms of knowledge (patterns): rules

Handbook of data mining and knowledge discovery
Rule induction

Intelligent data analysis
Mining with rarity: a unifying framework

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
A study of the behavior of several methods for balancing machine learning training data

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Class imbalances versus small disjuncts

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Learning Rules from Highly Unbalanced Data Sets

ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
Statistical Comparisons of Classifiers over Multiple Data Sets

The Journal of Machine Learning Research
Experimental perspectives on learning from imbalanced data

Proceedings of the 24th international conference on Machine learning
Modeling XCS in class imbalances: population size and parameter settings

Proceedings of the 9th annual conference on Genetic and evolutionary computation
The class imbalance problem: A systematic study

Intelligent Data Analysis
Distance functions for categorical and mixed variables

Pattern Recognition Letters
Selective Pre-processing of Imbalanced Data for Improving Classification Performance

DaWaK '08 Proceedings of the 10th international conference on Data Warehousing and Knowledge Discovery
An Empirical Investigation of the Trade-Off between Consistency and Coverage in Rule Learning Heuristics

DS '08 Proceedings of the 11th International Conference on Discovery Science
Compact Rule Learner on Weighted Fuzzy Approximation Spaces for Class Imbalanced and Hybrid Data

RSCTC '08 Proceedings of the 6th International Conference on Rough Sets and Current Trends in Computing
Machine Learning and Data Mining: Introduction to Principles and Algorithms

Machine Learning and Data Mining: Introduction to Principles and Algorithms
Learning from Imbalanced Data

IEEE Transactions on Knowledge and Data Engineering
Enhancing the effectiveness and interpretability of decision tree and rule induction classifiers with evolutionary training set selection over imbalanced problems

Applied Soft Computing
SMOTE: synthetic minority over-sampling technique

Journal of Artificial Intelligence Research
Learning when training data are costly: the effect of class distribution on tree induction

Journal of Artificial Intelligence Research
Improved heterogeneous distance functions

Journal of Artificial Intelligence Research
Concept learning and the problem of small disjuncts

IJCAI'89 Proceedings of the 11th international joint conference on Artificial intelligence - Volume 1
An empirical study of the behavior of classifiers on imbalanced and overlapped data sets

CIARP'07 Proceedings of the Congress on pattern recognition 12th Iberoamerican conference on Progress in pattern recognition, image analysis and applications
Boosting support vector machines for imbalanced data sets

Knowledge and Information Systems
Integrating selective pre-processing of imbalanced data with Ivotes ensemble

RSCTC'10 Proceedings of the 7th international conference on Rough sets and current trends in computing
Learning from imbalanced data in presence of noisy and borderline examples

RSCTC'10 Proceedings of the 7th international conference on Rough sets and current trends in computing
Evaluating Learning Algorithms: A Classification Perspective

Evaluating Learning Algorithms: A Classification Perspective
An imbalanced data rule learner

PKDD'05 Proceedings of the 9th European conference on Principles and Practice of Knowledge Discovery in Databases
Rough Sets for Handling Imbalanced Data: Combining Filtering and Rule-based Classifiers

Fundamenta Informaticae - SPECIAL ISSUE ON CONCURRENCY SPECIFICATION AND PROGRAMMING (CS&P 2005) Ruciane-Nide, Poland, 28-30 September 2005

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we consider induction of rule-based classifiers from imbalanced data, where one class (a minority class) is under-represented in comparison to the remaining majority classes. The minority class is usually of primary interest. However, most rule-based classifiers are biased towards the majority classes and they have difficulties with correct recognition of the minority class. In this paper we discuss sources of these difficulties related to data characteristics or to an algorithm itself. Among the problems related to the data distribution we focus on the role of small disjuncts, overlapping of classes and presence of noisy examples. Then, we show that standard techniques for induction of rule-based classifiers, such as sequential covering, top-down induction of rules or classification strategies, were created with the assumption of balanced data distribution, and we explain why they are biased towards the majority classes. Some modifications of rule-based classifiers have been already introduced, but they usually concentrate on individual problems. Therefore, we propose a novel algorithm, BRACID, which more comprehensively addresses the issues associated with imbalanced data. Its main characteristics includes a hybrid representation of rules and single examples, bottom-up learning of rules and a local classification strategy using nearest rules. The usefulness of BRACID has been evaluated in experiments on several imbalanced datasets. The results show that BRACID significantly outperforms the well known rule-based classifiers C4.5rules, RIPPER, PART, CN2, MODLEM as well as other related classifiers as RISE or K-NN. Moreover, it is comparable or better than the studied approaches specialized for imbalanced data such as generalizations of rule algorithms or combinations of SMOTE + ENN preprocessing with PART. Finally, it improves the support of minority class rules, leading to better recognition of the minority class examples.