Communications of the ACM - Special issue on parallelism
A Nearest Hyperrectangle Learning Method
Machine Learning
C4.5: programs for machine learning
C4.5: programs for machine learning
Using the m-estimate in rule induction
Journal of Computing and Information Technology
Unifying instance-based and rule-based induction
Machine Learning
Separate-and-Conquer Rule Learning
Artificial Intelligence Review
Mining needle in a haystack: classifying rare classes via two-phase rule induction
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Machine Learning and Data Mining; Methods and Applications
Machine Learning and Data Mining; Methods and Applications
Machine Learning
Learning When Negative Examples Abound
ECML '97 Proceedings of the 9th European Conference on Machine Learning
Generating Accurate Rule Sets Without Global Optimization
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
An Analysis of Quantitative Measures Associated with Rules
PAKDD '99 Proceedings of the Third Pacific-Asia Conference on Methodologies for Knowledge Discovery and Data Mining
ELEM2: A Learning System for More Accurate Classifications
AI '98 Proceedings of the 12th Biennial Conference of the Canadian Society for Computational Studies of Intelligence on Advances in Artificial Intelligence
A Case Study for Learning from Imbalanced Data Sets
AI '01 Proceedings of the 14th Biennial Conference of the Canadian Society on Computational Studies of Intelligence: Advances in Artificial Intelligence
Handbook of data mining and knowledge discovery
Handbook of data mining and knowledge discovery
Types and forms of knowledge (patterns): rules
Handbook of data mining and knowledge discovery
Intelligent data analysis
Mining with rarity: a unifying framework
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
A study of the behavior of several methods for balancing machine learning training data
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Class imbalances versus small disjuncts
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Learning Rules from Highly Unbalanced Data Sets
ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
Statistical Comparisons of Classifiers over Multiple Data Sets
The Journal of Machine Learning Research
Experimental perspectives on learning from imbalanced data
Proceedings of the 24th international conference on Machine learning
Modeling XCS in class imbalances: population size and parameter settings
Proceedings of the 9th annual conference on Genetic and evolutionary computation
The class imbalance problem: A systematic study
Intelligent Data Analysis
Distance functions for categorical and mixed variables
Pattern Recognition Letters
Selective Pre-processing of Imbalanced Data for Improving Classification Performance
DaWaK '08 Proceedings of the 10th international conference on Data Warehousing and Knowledge Discovery
DS '08 Proceedings of the 11th International Conference on Discovery Science
Compact Rule Learner on Weighted Fuzzy Approximation Spaces for Class Imbalanced and Hybrid Data
RSCTC '08 Proceedings of the 6th International Conference on Rough Sets and Current Trends in Computing
Machine Learning and Data Mining: Introduction to Principles and Algorithms
Machine Learning and Data Mining: Introduction to Principles and Algorithms
IEEE Transactions on Knowledge and Data Engineering
SMOTE: synthetic minority over-sampling technique
Journal of Artificial Intelligence Research
Learning when training data are costly: the effect of class distribution on tree induction
Journal of Artificial Intelligence Research
Improved heterogeneous distance functions
Journal of Artificial Intelligence Research
Concept learning and the problem of small disjuncts
IJCAI'89 Proceedings of the 11th international joint conference on Artificial intelligence - Volume 1
An empirical study of the behavior of classifiers on imbalanced and overlapped data sets
CIARP'07 Proceedings of the Congress on pattern recognition 12th Iberoamerican conference on Progress in pattern recognition, image analysis and applications
Boosting support vector machines for imbalanced data sets
Knowledge and Information Systems
Integrating selective pre-processing of imbalanced data with Ivotes ensemble
RSCTC'10 Proceedings of the 7th international conference on Rough sets and current trends in computing
Learning from imbalanced data in presence of noisy and borderline examples
RSCTC'10 Proceedings of the 7th international conference on Rough sets and current trends in computing
Evaluating Learning Algorithms: A Classification Perspective
Evaluating Learning Algorithms: A Classification Perspective
An imbalanced data rule learner
PKDD'05 Proceedings of the 9th European conference on Principles and Practice of Knowledge Discovery in Databases
Rough Sets for Handling Imbalanced Data: Combining Filtering and Rule-based Classifiers
Fundamenta Informaticae - SPECIAL ISSUE ON CONCURRENCY SPECIFICATION AND PROGRAMMING (CS&P 2005) Ruciane-Nide, Poland, 28-30 September 2005
Hi-index | 0.00 |
In this paper we consider induction of rule-based classifiers from imbalanced data, where one class (a minority class) is under-represented in comparison to the remaining majority classes. The minority class is usually of primary interest. However, most rule-based classifiers are biased towards the majority classes and they have difficulties with correct recognition of the minority class. In this paper we discuss sources of these difficulties related to data characteristics or to an algorithm itself. Among the problems related to the data distribution we focus on the role of small disjuncts, overlapping of classes and presence of noisy examples. Then, we show that standard techniques for induction of rule-based classifiers, such as sequential covering, top-down induction of rules or classification strategies, were created with the assumption of balanced data distribution, and we explain why they are biased towards the majority classes. Some modifications of rule-based classifiers have been already introduced, but they usually concentrate on individual problems. Therefore, we propose a novel algorithm, BRACID, which more comprehensively addresses the issues associated with imbalanced data. Its main characteristics includes a hybrid representation of rules and single examples, bottom-up learning of rules and a local classification strategy using nearest rules. The usefulness of BRACID has been evaluated in experiments on several imbalanced datasets. The results show that BRACID significantly outperforms the well known rule-based classifiers C4.5rules, RIPPER, PART, CN2, MODLEM as well as other related classifiers as RISE or K-NN. Moreover, it is comparable or better than the studied approaches specialized for imbalanced data such as generalizations of rule algorithms or combinations of SMOTE + ENN preprocessing with PART. Finally, it improves the support of minority class rules, leading to better recognition of the minority class examples.