C4.5: programs for machine learning
C4.5: programs for machine learning
A decision-theoretic generalization of on-line learning and an application to boosting
Journal of Computer and System Sciences - Special issue: 26th annual ACM symposium on the theory of computing & STOC'94, May 23–25, 1994, and second annual Europe an conference on computational learning theory (EuroCOLT'95), March 13–15, 1995
Machine Learning for the Detection of Oil Spills in Satellite Radar Images
Machine Learning - Special issue on applications of machine learning and the knowledge discovery process
Data mining: practical machine learning tools and techniques with Java implementations
Data mining: practical machine learning tools and techniques with Java implementations
Information Retrieval
The Case against Accuracy Estimation for Comparing Induction Algorithms
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
AdaCost: Misclassification Cost-Sensitive Boosting
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
A Comparative Study of Cost-Sensitive Boosting Algorithms
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
AdaBoosting Neural Networks: Application to on-line Character Recognition
ICANN '97 Proceedings of the 7th International Conference on Artificial Neural Networks
SMOTE: synthetic minority over-sampling technique
Journal of Artificial Intelligence Research
Editorial: special issue on learning from imbalanced data sets
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Mining relational databases with multi-view learning
MRDM '05 Proceedings of the 4th international workshop on Multi-relational mining
Expert Systems with Applications: An International Journal
Proceedings of the 24th international conference on Machine learning
Cost-sensitive boosting for classification of imbalanced data
Pattern Recognition
An Evaluation of the Robustness of MTS for Imbalanced Data
IEEE Transactions on Knowledge and Data Engineering
Learning verb complements for modern greek: Balancing the noisy dataset
Natural Language Engineering
AdaBoost with SVM-based component classifiers
Engineering Applications of Artificial Intelligence
An information granulation based data mining approach for classifying imbalanced data
Information Sciences: an International Journal
ICDM '08 Proceedings of the 8th industrial conference on Advances in Data Mining: Medical Applications, E-Commerce, Marketing, and Theoretical Aspects
FLSOM with Different Rates for Classification in Imbalanced Datasets
ICANN '08 Proceedings of the 18th international conference on Artificial Neural Networks, Part I
Using granular computing model to induce scheduling knowledge in dynamic manufacturing environments
International Journal of Computer Integrated Manufacturing
Multirelational classification: a multiple view approach
Knowledge and Information Systems
Learning from Skewed Class Multi-relational Databases
Fundamenta Informaticae - Progress on Multi-Relational Data Mining
MDS: a novel method for class imbalance learning
Proceedings of the 3rd International Conference on Ubiquitous Information Management and Communication
A Combination Classification Algorithm Based on Outlier Detection and C4.5
ADMA '09 Proceedings of the 5th International Conference on Advanced Data Mining and Applications
Handling Class Imbalance Problems via Weighted BP Algorithm
ADMA '09 Proceedings of the 5th International Conference on Advanced Data Mining and Applications
Evolutionary undersampling for classification with imbalanced datasets: Proposals and taxonomy
Evolutionary Computation
SVMs modeling for highly imbalanced classification
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics - Special issue on human computing
Exploratory undersampling for class-imbalance learning
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Diversity exploration and negative correlation learning on imbalanced data sets
IJCNN'09 Proceedings of the 2009 international joint conference on Neural Networks
Improving software-quality predictions with data sampling and boosting
IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans
ISBRA'07 Proceedings of the 3rd international conference on Bioinformatics research and applications
Rectangular basis functions applied to imbalanced datasets
ICANN'07 Proceedings of the 17th international conference on Artificial neural networks
Boosting support vector machines for imbalanced data sets
ISMIS'08 Proceedings of the 17th international conference on Foundations of intelligent systems
A learning method for the class imbalance problem with medical data sets
Computers in Biology and Medicine
An improved AdaBoost algorithm for unbalanced classification data
FSKD'09 Proceedings of the 6th international conference on Fuzzy systems and knowledge discovery - Volume 1
FSVM-CIL: fuzzy support vector machines for class imbalance learning
IEEE Transactions on Fuzzy Systems - Special section on computing with words
Medical case retrieval from a committee of decision trees
IEEE Transactions on Information Technology in Biomedicine
RAMOBoost: ranked minority oversampling in boosting
IEEE Transactions on Neural Networks
Comparison of metrics for feature selection in imbalanced text classification
Expert Systems with Applications: An International Journal
The imbalanced problem in morphological galaxy classification
CIARP'10 Proceedings of the 15th Iberoamerican congress conference on Progress in pattern recognition, image analysis, computer vision, and applications
Neural network classifers in arrears management
ICANN'05 Proceedings of the 15th international conference on Artificial neural networks: formal models and their applications - Volume Part II
Combining integrated sampling with SVM ensembles for learning from imbalanced datasets
Information Processing and Management: an International Journal
Margin-based over-sampling method for learning from imbalanced datasets
PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part II
Clustering based bagging algorithm on imbalanced data sets
IUKM'11 Proceedings of the 2011 international conference on Integrated uncertainty in knowledge modelling and decision making
Pattern Recognition Letters
Boosting prediction accuracy on imbalanced datasets with SVM ensembles
PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Automated retraining methods for document classification and their parameter tuning
WISE'05 Proceedings of the 6th international conference on Web Information Systems Engineering
Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning
ICIC'05 Proceedings of the 2005 international conference on Advances in Intelligent Computing - Volume Part I
Generating diverse ensembles to counter the problem of class imbalance
PAKDD'10 Proceedings of the 14th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part II
A novel synthetic minority oversampling technique for imbalanced data set learning
ICONIP'11 Proceedings of the 18th international conference on Neural Information Processing - Volume Part II
Preprocessing unbalanced data using support vector machine
Decision Support Systems
A new over-sampling approach: Random-SMOTE for learning from imbalanced data sets
KSEM'11 Proceedings of the 5th international conference on Knowledge Science, Engineering and Management
A normal distribution-based over-sampling approach to imbalanced data classification
ADMA'11 Proceedings of the 7th international conference on Advanced Data Mining and Applications - Volume Part I
Semi-supervised learning for imbalanced sentiment classification
IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three
Learning from Skewed Class Multi-relational Databases
Fundamenta Informaticae - Progress on Multi-Relational Data Mining
Sample cutting method for imbalanced text sentiment classification based on BRC
Knowledge-Based Systems
Computers and Electrical Engineering
Mining Data Streams with Skewed Distribution based on Ensemble Method
International Journal of Advanced Pervasive and Ubiquitous Computing
Expert Systems with Applications: An International Journal
Multimedia Tools and Applications
IIvotes ensemble for imbalanced data
Intelligent Data Analysis - Combined Learning Methods and Mining Complex Data
Hi-index | 0.00 |
Learning from imbalanced data sets, where the number of examples of one (majority) class is much higher than the others, presents an important challenge to the machine learning community. Traditional machine learning algorithms may be biased towards the majority class, thus producing poor predictive accuracy over the minority class. In this paper, we describe a new approach that combines boosting, an ensemble-based learning algorithm, with data generation to improve the predictive power of classifiers against imbalanced data sets consisting of two classes. In the DataBoost-IM method, hard examples from both the majority and minority classes are identified during execution of the boosting algorithm. Subsequently, the hard examples are used to separately generate synthetic examples for the majority and minority classes. The synthetic data are then added to the original training set, and the class distribution and the total weights of the different classes in the new training set are rebalanced. The DataBoost-IM method was evaluated, in terms of the F-measures, G-mean and overall accuracy, against seventeen highly and moderately imbalanced data sets using decision trees as base classifiers. Our results are promising and show that the DataBoost-IM method compares well in comparison with a base classifier, a standard benchmarking boosting algorithm and three advanced boosting-based algorithms for imbalanced data set. Results indicate that our approach does not sacrifice one class in favor of the other, but produces high predictions against both minority and majority classes.