Communications of the ACM - Special issue on parallelism
C4.5: programs for machine learning
C4.5: programs for machine learning
MetaCost: a general method for making classifiers cost-sensitive
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Reduction Techniques for Instance-BasedLearning Algorithms
Machine Learning
Learning and making decisions when costs and probabilities are both unknown
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Machine Learning
Learning Decision Trees Using the Area Under the ROC Curve
ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
M-tree: An Efficient Access Method for Similarity Search in Metric Spaces
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Improving Identification of Difficult Small Classes by Balancing Class Distribution
AIME '01 Proceedings of the 8th Conference on AI in Medicine in Europe: Artificial Intelligence Medicine
The class imbalance problem: A systematic study
Intelligent Data Analysis
SMOTE: synthetic minority over-sampling technique
Journal of Artificial Intelligence Research
Learning when training data are costly: the effect of class distribution on tree induction
Journal of Artificial Intelligence Research
Editorial: special issue on learning from imbalanced data sets
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Wrapper-based computation and evaluation of sampling methods for imbalanced datasets
UBDM '05 Proceedings of the 1st international workshop on Utility-based data mining
Bias Analysis in Text Classification for Highly Skewed Data
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Prediction of dose escalation for rheumatoid arthritis patients under infliximab treatment
Engineering Applications of Artificial Intelligence
Rough Sets for Handling Imbalanced Data: Combining Filtering and Rule-based Classifiers
Fundamenta Informaticae - SPECIAL ISSUE ON CONCURRENCY SPECIFICATION AND PROGRAMMING (CS&P 2005) Ruciane-Nide, Poland, 28-30 September 2005
Expert Systems with Applications: An International Journal
On the Classification of a Small Imbalanced Cytogenetic Image Database
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Mining breast cancer data with XCS
Proceedings of the 9th annual conference on Genetic and evolutionary computation
Cost-sensitive boosting for classification of imbalanced data
Pattern Recognition
An Evaluation of the Robustness of MTS for Imbalanced Data
IEEE Transactions on Knowledge and Data Engineering
A weighted rough set based method developed for class imbalance learning
Information Sciences: an International Journal
Learning verb complements for modern greek: Balancing the noisy dataset
Natural Language Engineering
Borderline detection by Bayes vector quantizers
Proceedings of the 2008 ACM symposium on Applied computing
Detection of stock price movements using chance discovery and genetic programming
International Journal of Knowledge-based and Intelligent Engineering Systems - Chance discovery
An information granulation based data mining approach for classifying imbalanced data
Information Sciences: an International Journal
Automatically countering imbalance and its empirical relationship to cost
Data Mining and Knowledge Discovery
When Overlapping Unexpectedly Alters the Class Imbalance Effects
IbPRIA '07 Proceedings of the 3rd Iberian conference on Pattern Recognition and Image Analysis, Part II
WILF '07 Proceedings of the 7th international workshop on Fuzzy Logic and Applications: Applications of Fuzzy Sets Theory
Selective Pre-processing of Imbalanced Data for Improving Classification Performance
DaWaK '08 Proceedings of the 10th international conference on Data Warehousing and Knowledge Discovery
Learning Decision Trees for Unbalanced Data
ECML PKDD '08 Proceedings of the 2008 European Conference on Machine Learning and Knowledge Discovery in Databases - Part I
Using granular computing model to induce scheduling knowledge in dynamic manufacturing environments
International Journal of Computer Integrated Manufacturing
A comparative study on rough set based class imbalance learning
Knowledge-Based Systems
Web robot detection: A probabilistic reasoning approach
Computer Networks: The International Journal of Computer and Telecommunications Networking
On the use of surrounding neighbors for synthetic over-sampling of the minority class
SMO'08 Proceedings of the 8th conference on Simulation, modelling and optimization
International Journal of Approximate Reasoning
MDS: a novel method for class imbalance learning
Proceedings of the 3rd International Conference on Ubiquitous Information Management and Communication
Expert Systems with Applications: An International Journal
Using pre & post-processing methods to improve binding site predictions
Pattern Recognition
Classification of software behaviors for failure detection: a discriminative pattern mining approach
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Mind the gaps: weighting the unknown in large-scale one-class collaborative filtering
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Use of Ensemble Based on GA for Imbalance Problem
ISNN 2009 Proceedings of the 6th International Symposium on Neural Networks: Advances in Neural Networks - Part II
A Preliminar Analysis of CO2RBFN in Imbalanced Problems
IWANN '09 Proceedings of the 10th International Work-Conference on Artificial Neural Networks: Part I: Bio-Inspired Systems: Computational and Ambient Intelligence
Consistency Measure of Multiple Classifiers for Land Cover Classification by Remote Sensing Image
MCS '09 Proceedings of the 8th International Workshop on Multiple Classifier Systems
HAIS '09 Proceedings of the 4th International Conference on Hybrid Artificial Intelligence Systems
Hybrid sampling for imbalanced data
Integrated Computer-Aided Engineering - Selected papers from the IEEE Conference on Information Reuse and Integration (IRI), July 13-15, 2008
A Weighted Rough Set Approach for Cost-Sensitive Learning
RSFDGrC '07 Proceedings of the 11th International Conference on Rough Sets, Fuzzy Sets, Data Mining and Granular Computing
Several SVM Ensemble Methods Integrated with Under-Sampling for Imbalanced Data Learning
ADMA '09 Proceedings of the 5th International Conference on Advanced Data Mining and Applications
Handling Class Imbalance Problems via Weighted BP Algorithm
ADMA '09 Proceedings of the 5th International Conference on Advanced Data Mining and Applications
A Hybrid Approach Handling Imbalanced Datasets
ICIAP '09 Proceedings of the 15th International Conference on Image Analysis and Processing
A resource-poor approach for linking ontology classes to Wikipedia articles
STEP '08 Proceedings of the 2008 Conference on Semantics in Text Processing
Evolutionary undersampling for classification with imbalanced datasets: Proposals and taxonomy
Evolutionary Computation
Knowledge discovery from imbalanced and noisy data
Data & Knowledge Engineering
Agreement detection in multiparty conversation
Proceedings of the 2009 international conference on Multimodal interfaces
Exploratory undersampling for class-imbalance learning
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Classifying Multiple Imbalanced Attributes in Relational Data
AI '09 Proceedings of the 22nd Australasian Joint Conference on Advances in Artificial Intelligence
IJCNN'09 Proceedings of the 2009 international joint conference on Neural Networks
Diversity exploration and negative correlation learning on imbalanced data sets
IJCNN'09 Proceedings of the 2009 international joint conference on Neural Networks
SERA: selectively recursive approach towards nonstationary imbalanced stream data mining
IJCNN'09 Proceedings of the 2009 international joint conference on Neural Networks
Information Sciences: an International Journal
Facetwise analysis of XCS for problems with class imbalances
IEEE Transactions on Evolutionary Computation
A data-driven approach to manage the length of stay for appendectomy patients
IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans
Selective costing ensemble for handling imbalanced data sets
International Journal of Hybrid Intelligent Systems
Proceedings of the international conference on Multimedia information retrieval
On the discovery of subsumption relations for the alignment of ontologies
Web Semantics: Science, Services and Agents on the World Wide Web
IWANN'07 Proceedings of the 9th international work conference on Artificial neural networks
Malware detection based on mining API calls
Proceedings of the 2010 ACM Symposium on Applied Computing
Taking class importance into account
ICHIT'06 Proceedings of the 1st international conference on Advances in hybrid information technology
Class-oriented reduction of decision tree complexity
ISMIS'08 Proceedings of the 17th international conference on Foundations of intelligent systems
Analyzing PETs on imbalanced datasets when training and testing class distributions differ
PAKDD'08 Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining
Transfer estimation of evolving class priors in data stream classification
Pattern Recognition
An unsupervised self-organizing learning with support vector ranking for imbalanced datasets
Expert Systems with Applications: An International Journal
Analysis of an evolutionary RBFN design algorithm, CO2RBFN, for imbalanced data sets
Pattern Recognition Letters
Language independent system for definition extraction: first results using learning algorithms
WDE '09 Proceedings of the 1st Workshop on Definition Extraction
Hierarchical service analytics for improving productivity in an enterprise service center
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
RAMOBoost: ranked minority oversampling in boosting
IEEE Transactions on Neural Networks
IEEE Transactions on Evolutionary Computation
A simple approach to incorporate label dependency in multi-label classification
MICAI'10 Proceedings of the 9th Mexican international conference on Artificial intelligence conference on Advances in soft computing: Part II
Ensemble Learning with Active Example Selection for Imbalanced Biomedical Data Classification
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Learning without default: a study of one-class classification and the low-default portfolio problem
AICS'09 Proceedings of the 20th Irish conference on Artificial intelligence and cognitive science
Classification of imbalanced data by combining the complementary neural network and SMOTE algorithm
ICONIP'10 Proceedings of the 17th international conference on Neural information processing: models and applications - Volume Part II
Exploiting probabilistic topic models to improve text categorization under class imbalance
Information Processing and Management: an International Journal
Exploring the performance of resampling strategies for the class imbalance problem
IEA/AIE'10 Proceedings of the 23rd international conference on Industrial engineering and other applications of applied intelligent systems - Volume Part I
A multi-objective optimisation approach for class imbalance learning
Pattern Recognition
A dynamic over-sampling procedure based on sensitivity for multi-class problems
Pattern Recognition
Borderline over-sampling for imbalanced data classification
International Journal of Knowledge Engineering and Soft Data Paradigms
Linguistic cost-sensitive learning of genetic fuzzy classifiers for imprecise data
International Journal of Approximate Reasoning
An empirical analysis of under-sampling techniques to balance a protein structural class dataset
ICONIP'06 Proceedings of the 13th international conference on Neural information processing - Volume Part III
Software defect detection with rocus
Journal of Computer Science and Technology
Genetic algorithms as a pre processing strategy for imbalanced datasets
Proceedings of the 13th annual conference companion on Genetic and evolutionary computation
HAIS'11 Proceedings of the 6th international conference on Hybrid artificial intelligent systems - Volume Part I
Evolutionary-based selection of generalized instances for imbalanced classification
Knowledge-Based Systems
Comparing alternative classifiers for database marketing: The case of imbalanced datasets
Expert Systems with Applications: An International Journal
Compact ensemble trees for imbalanced data
MCS'11 Proceedings of the 10th international conference on Multiple classifier systems
Incorporating label dependency into the binary relevance framework for multi-label classification
Expert Systems with Applications: An International Journal
An experimental comparison of classification algorithms for imbalanced credit scoring data sets
Expert Systems with Applications: An International Journal
CAEPIA'11 Proceedings of the 14th international conference on Advances in artificial intelligence: spanish association for artificial intelligence
Improving SVM training by means of NTIL when the data sets are imbalanced
ISMIS'06 Proceedings of the 16th international conference on Foundations of Intelligent Systems
Hellinger distance decision trees are robust and skew-insensitive
Data Mining and Knowledge Discovery
ISCSLP SR evaluation, UVA–CS_es system description. a system based on ANNs
ISCSLP'06 Proceedings of the 5th international conference on Chinese Spoken Language Processing
Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning
ICIC'05 Proceedings of the 2005 international conference on Advances in Intelligent Computing - Volume Part I
Expert Systems with Applications: An International Journal
AI'10 Proceedings of the 23rd Canadian conference on Advances in Artificial Intelligence
Generating diverse ensembles to counter the problem of class imbalance
PAKDD'10 Proceedings of the 14th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part II
A novel synthetic minority oversampling technique for imbalanced data set learning
ICONIP'11 Proceedings of the 18th international conference on Neural Information Processing - Volume Part II
Combined effects of class imbalance and class overlap on instance-based classification
IDEAL'06 Proceedings of the 7th international conference on Intelligent Data Engineering and Automated Learning
A proposal of evolutionary prototype selection for class imbalance problems
IDEAL'06 Proceedings of the 7th international conference on Intelligent Data Engineering and Automated Learning
MICAI'11 Proceedings of the 10th international conference on Artificial Intelligence: advances in Soft Computing - Volume Part II
Preprocessing unbalanced data using support vector machine
Decision Support Systems
An efficient ensemble method for classifying skewed data streams
ICIC'11 Proceedings of the 7th international conference on Intelligent Computing: bio-inspired computing and applications
A normal distribution-based over-sampling approach to imbalanced data classification
ADMA'11 Proceedings of the 7th international conference on Advanced Data Mining and Applications - Volume Part I
WSEAS Transactions on Information Science and Applications
DBSMOTE: Density-Based Synthetic Minority Over-sampling TEchnique
Applied Intelligence
Identifying the medical practice after total hip arthroplasty using an integrated hybrid approach
Computers in Biology and Medicine
Towards improving automatic image annotation using improvised fractal SMOTE approach
Proceedings of the International Conference on Advances in Computing, Communications and Informatics
Proceedings of the 8th International Conference on Predictive Models in Software Engineering
Extensions of ant-miner algorithm to deal with class imbalance problem
IDEAL'12 Proceedings of the 13th international conference on Intelligent Data Engineering and Automated Learning
Rough Sets for Handling Imbalanced Data: Combining Filtering and Rule-based Classifiers
Fundamenta Informaticae - SPECIAL ISSUE ON CONCURRENCY SPECIFICATION AND PROGRAMMING (CS&P 2005) Ruciane-Nide, Poland, 28-30 September 2005
BRACID: a comprehensive approach to learning rules from imbalanced data
Journal of Intelligent Information Systems
Synthetic pattern generation for imbalanced learning in image retrieval
Pattern Recognition Letters
An efficient and simple under-sampling technique for imbalanced time series classification
Proceedings of the 21st ACM international conference on Information and knowledge management
Improving ANNs performance on unbalanced data with an AUC-Based learning algorithm
ICANN'12 Proceedings of the 22nd international conference on Artificial Neural Networks and Machine Learning - Volume Part II
Multiple extreme learning machines for a two-class imbalance corporate life cycle prediction
Knowledge-Based Systems
ICONIP'12 Proceedings of the 19th international conference on Neural Information Processing - Volume Part IV
One-sided prototype selection on class imbalanced dissimilarity matrices
SSPR'12/SPR'12 Proceedings of the 2012 Joint IAPR international conference on Structural, Syntactic, and Statistical Pattern Recognition
Improving risk predictions by preprocessing imbalanced credit data
ICONIP'12 Proceedings of the 19th international conference on Neural Information Processing - Volume Part II
A new framework for optimal classifier design
Pattern Recognition
A vector-valued support vector machine model for multiclass problem
Information Sciences: an International Journal
A new probabilistic active sample selection algorithm for class imbalance problem
International Journal of Knowledge Engineering and Soft Data Paradigms
Charting the digital library evaluation domain with a semantically enhanced mining methodology
Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries
Novel classifier scheme for imbalanced problems
Pattern Recognition Letters
Evaluation of sampling methods for learning from imbalanced data
ICIC'13 Proceedings of the 9th international conference on Intelligent Computing Theories
Class imbalance and the curse of minority hubs
Knowledge-Based Systems
Training and assessing classification rules with imbalanced data
Data Mining and Knowledge Discovery
Information Sciences: an International Journal
A novel framework for concept detection on large scale video database and feature pool
Artificial Intelligence Review
Multimedia Tools and Applications
Improving predictive models of glaucoma severity by incorporating quality indicators
Artificial Intelligence in Medicine
Irrelevant attributes and imbalanced classes in multi-label text-categorization domains
Intelligent Data Analysis
IIvotes ensemble for imbalanced data
Intelligent Data Analysis - Combined Learning Methods and Mining Complex Data
Evaluation of a new hybrid algorithm for highly imbalanced classification problems
International Journal of Hybrid Intelligent Systems
A combined approach to tackle imbalanced data sets
International Journal of Hybrid Intelligent Systems
DConfusion: a technique to allow cross study performance evaluation of fault prediction studies
Automated Software Engineering
Hi-index | 0.01 |
There are several aspects that might influence the performance achieved by existing learning systems. It has been reported that one of these aspects is related to class imbalance in which examples in training data belonging to one class heavily outnumber the examples in the other class. In this situation, which is found in real world data describing an infrequent but important event, the learning system may have difficulties to learn the concept related to the minority class. In this work we perform a broad experimental evaluation involving ten methods, three of them proposed by the authors, to deal with the class imbalance problem in thirteen UCI data sets. Our experiments provide evidence that class imbalance does not systematically hinder the performance of learning systems. In fact, the problem seems to be related to learning with too few minority class examples in the presence of other complicating factors, such as class overlapping. Two of our proposed methods deal with these conditions directly, allying a known over-sampling method with data cleaning methods in order to produce better-defined class clusters. Our comparative experiments show that, in general, over-sampling methods provide more accurate results than under-sampling methods considering the area under the ROC curve (AUC). This result seems to contradict results previously published in the literature. Two of our proposed methods, Smote + Tomek and Smote + ENN, presented very good results for data sets with a small number of positive examples. Moreover, Random over-sampling, a very simple over-sampling method, is very competitive to more complex over-sampling methods. Since the over-sampling methods provided very good performance results, we also measured the syntactic complexity of the decision trees induced from over-sampled data. Our results show that these trees are usually more complex then the ones induced from original data. Random over-sampling usually produced the smallest increase in the mean number of induced rules and Smote + ENN the smallest increase in the mean number of conditions per rule, when compared among the investigated over-sampling methods.