C4.5: programs for machine learning
C4.5: programs for machine learning
Machine Learning
Inductive learning algorithms and representations for text categorization
Proceedings of the seventh international conference on Information and knowledge management
Machine Learning for the Detection of Oil Spills in Satellite Radar Images
Machine Learning - Special issue on applications of machine learning and the knowledge discovery process
MetaCost: a general method for making classifiers cost-sensitive
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Learning and making decisions when costs and probabilities are both unknown
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
The Case against Accuracy Estimation for Comparing Induction Algorithms
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Feature Selection for Unbalanced Class Distribution and Naive Bayes
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Tree Induction for Probability-Based Ranking
Machine Learning
Cost-Sensitive Learning by Cost-Proportionate Example Weighting
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Naive Bayes vs decision trees in intrusion detection systems
Proceedings of the 2004 ACM symposium on Applied computing
Editorial: special issue on learning from imbalanced data sets
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
A study of the behavior of several methods for balancing machine learning training data
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Distributed computing in practice: the Condor experience: Research Articles
Concurrency and Computation: Practice & Experience - Grid Performance
Wrapper-based computation and evaluation of sampling methods for imbalanced datasets
UBDM '05 Proceedings of the 1st international workshop on Utility-based data mining
Training Cost-Sensitive Neural Networks with Methods Addressing the Class Imbalance Problem
IEEE Transactions on Knowledge and Data Engineering
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
SMOTE: synthetic minority over-sampling technique
Journal of Artificial Intelligence Research
Learning when training data are costly: the effect of class distribution on tree induction
Journal of Artificial Intelligence Research
The foundations of cost-sensitive learning
IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2
Ensembles of classifiers from spatially disjoint data
MCS'05 Proceedings of the 6th international conference on Multiple Classifier Systems
Guest editorial: special issue on utility-based data mining
Data Mining and Knowledge Discovery
Learning Decision Trees for Unbalanced Data
ECML PKDD '08 Proceedings of the 2008 European Conference on Machine Learning and Knowledge Discovery in Databases - Part I
Knowledge discovery from imbalanced and noisy data
Data & Knowledge Engineering
Improving software-quality predictions with data sampling and boosting
IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans
Analyzing PETs on imbalanced datasets when training and testing class distributions differ
PAKDD'08 Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining
IEEE Transactions on Neural Networks
Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
RAMOBoost: ranked minority oversampling in boosting
IEEE Transactions on Neural Networks
Borderline over-sampling for imbalanced data classification
International Journal of Knowledge Engineering and Soft Data Paradigms
Classifying severely imbalanced data
Canadian AI'11 Proceedings of the 24th Canadian conference on Advances in artificial intelligence
Evolutionary-based selection of generalized instances for imbalanced classification
Knowledge-Based Systems
Ensembles of decision trees for imbalanced data
MCS'11 Proceedings of the 10th international conference on Multiple classifier systems
Hellinger distance decision trees are robust and skew-insensitive
Data Mining and Knowledge Discovery
Expert Systems with Applications: An International Journal
Generating diverse ensembles to counter the problem of class imbalance
PAKDD'10 Proceedings of the 14th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part II
MCPR'12 Proceedings of the 4th Mexican conference on Pattern Recognition
Evaluation of sampling methods for learning from imbalanced data
ICIC'13 Proceedings of the 9th international conference on Intelligent Computing Theories
Editorial: Parameter-free classification in multi-class imbalanced data sets
Data & Knowledge Engineering
Hi-index | 0.00 |
Learning from imbalanced data sets presents a convoluted problem both from the modeling and cost standpoints. In particular, when a class is of great interest but occurs relatively rarely such as in cases of fraud, instances of disease, and regions of interest in large-scale simulations, there is a correspondingly high cost for the misclassification of rare events. Under such circumstances, the data set is often re-sampled to generate models with high minority class accuracy. However, the sampling methods face a common, but important, criticism: how to automatically discover the proper amount and type of sampling? To address this problem, we propose a wrapper paradigm that discovers the amount of re-sampling for a data set based on optimizing evaluation functions like the f-measure, Area Under the ROC Curve (AUROC), cost, cost-curves, and the cost dependent f-measure. Our analysis of the wrapper is twofold. First, we report the interaction between different evaluation and wrapper optimization functions. Second, we present a set of results in a cost- sensitive environment, including scenarios of unknown or changing cost matrices. We also compared the performance of the wrapper approach versus cost-sensitive learning methods--MetaCost and the Cost-Sensitive Classifiers--and found the wrapper to outperform the cost-sensitive classifiers in a cost-sensitive environment. Lastly, we obtained the lowest cost per test example compared to any result we are aware of for the KDD-99 Cup intrusion detection data set.