C4.5: programs for machine learning
C4.5: programs for machine learning
Improving Generalization with Active Learning
Machine Learning - Special issue on structured connectionist systems
Efficient progressive sampling
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
A simple, fast, and effective rule learner
AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
Multiple Comparisons in Induction Algorithms
Machine Learning
Robust Classification for Imprecise Environments
Machine Learning
Data Mining and Knowledge Discovery
Pruning Decision Trees with Misclassification Costs
ECML '98 Proceedings of the 10th European Conference on Machine Learning
The Case against Accuracy Estimation for Comparing Induction Algorithms
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Exploiting the Cost (In)sensitivity of Decision Tree Splitting Criteria
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
A Quantitative Study of Small Disjuncts
Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence
A Mixture-of-Experts Framework for Learning from Imbalanced Data Sets
IDA '01 Proceedings of the 4th International Conference on Advances in Intelligent Data Analysis
Learning and Making Decisions When Costs and Probabilities are Both Unknown
Learning and Making Decisions When Costs and Probabilities are Both Unknown
Active Sampling for Class Probability Estimation and Ranking
Machine Learning
The class imbalance problem: A systematic study
Intelligent Data Analysis
Active learning for class probability estimation and ranking
IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2
The foundations of cost-sensitive learning
IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2
Editorial: special issue on learning from imbalanced data sets
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Mining with rarity: a unifying framework
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
A study of the behavior of several methods for balancing machine learning training data
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Classification and knowledge discovery in protein databases
Journal of Biomedical Informatics - Special issue: Biomedical machine learning
KBA: Kernel Boundary Alignment Considering Imbalanced Data Distribution
IEEE Transactions on Knowledge and Data Engineering
Wrapper-based computation and evaluation of sampling methods for imbalanced datasets
UBDM '05 Proceedings of the 1st international workshop on Utility-based data mining
Does cost-sensitive learning beat sampling for classifying rare classes?
UBDM '05 Proceedings of the 1st international workshop on Utility-based data mining
Sharing Classifiers among Ensembles from Related Problem Domains
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Bias Analysis in Text Classification for Highly Skewed Data
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Learning through Changes: An Empirical Study of Dynamic Behaviors of Probability Estimation Trees
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Maximum profit mining and its application in software development
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Learning concepts from large scale imbalanced data sets using support cluster machines
MULTIMEDIA '06 Proceedings of the 14th annual ACM international conference on Multimedia
Maximizing classifier utility when training data is costly
ACM SIGKDD Explorations Newsletter
On the Classification of a Small Imbalanced Cytogenetic Image Database
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Experimental perspectives on learning from imbalanced data
Proceedings of the 24th international conference on Machine learning
Artificial Intelligence Review
Selective costing voting for bankruptcy prediction
International Journal of Knowledge-based and Intelligent Engineering Systems
Get another label? improving data quality and data mining using multiple, noisy labelers
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Maximizing classifier utility when there are data acquisition and modeling costs
Data Mining and Knowledge Discovery
Quantifying counts and costs via classification
Data Mining and Knowledge Discovery
Automatically countering imbalance and its empirical relationship to cost
Data Mining and Knowledge Discovery
Decision trees for hierarchical multi-label classification
Machine Learning
A Cost-Sensitive Approach to Feature Selection in Micro-Array Data Classification
WILF '07 Proceedings of the 7th international workshop on Fuzzy Logic and Applications: Applications of Fuzzy Sets Theory
Roulette Sampling for Cost-Sensitive Learning
ECML '07 Proceedings of the 18th European conference on Machine Learning
The ROC isometrics approach to construct reliable classifiers
Intelligent Data Analysis
Quantification and semi-supervised classification methods for handling changes in class distribution
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Classifying SSH encrypted traffic with minimum packet header features using genetic programming
Proceedings of the 11th Annual Conference Companion on Genetic and Evolutionary Computation Conference: Late Breaking Papers
Hybrid sampling for imbalanced data
Integrated Computer-Aided Engineering - Selected papers from the IEEE Conference on Information Reuse and Integration (IRI), July 13-15, 2008
Thresholding for making classifiers cost-sensitive
AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
Toward breast cancer survivability prediction models through improving training space
Expert Systems with Applications: An International Journal
A Hybrid Approach Handling Imbalanced Datasets
ICIAP '09 Proceedings of the 15th International Conference on Image Analysis and Processing
Evolutionary undersampling for classification with imbalanced datasets: Proposals and taxonomy
Evolutionary Computation
Knowledge discovery from imbalanced and noisy data
Data & Knowledge Engineering
Evolutionary sampling and software quality modeling of high-assurance systems
IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans
Exploratory undersampling for class-imbalance learning
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
CEC'09 Proceedings of the Eleventh conference on Congress on Evolutionary Computation
Multi-Objective Genetic Programming for Classification with Unbalanced Data
AI '09 Proceedings of the 22nd Australasian Joint Conference on Advances in Artificial Intelligence
Information Sciences: an International Journal
Facetwise analysis of XCS for problems with class imbalances
IEEE Transactions on Evolutionary Computation
Improving software-quality predictions with data sampling and boosting
IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans
Evolutionary data analysis for the class imbalance problem
Intelligent Data Analysis
Selective costing ensemble for handling imbalanced data sets
International Journal of Hybrid Intelligent Systems
Weighted rough set learning: towards a subjective approach
PAKDD'07 Proceedings of the 11th Pacific-Asia conference on Advances in knowledge discovery and data mining
A comparison of different off-centered entropies to deal with class imbalance for decision trees
PAKDD'08 Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining
Full border identification for reduction of training sets
Canadian AI'08 Proceedings of the Canadian Society for computational studies of intelligence, 21st conference on Advances in artificial intelligence
GP classification under imbalanced data sets: active sub-sampling and AUC approximation
EuroGP'08 Proceedings of the 11th European conference on Genetic programming
AUC analysis of the pareto-front using multi-objective GP for classification with unbalanced data
Proceedings of the 12th annual conference on Genetic and evolutionary computation
IEEE Transactions on Neural Networks
Journal of Intelligent Information Systems
Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Analysis of an evolutionary RBFN design algorithm, CO2RBFN, for imbalanced data sets
Pattern Recognition Letters
RAMOBoost: ranked minority oversampling in boosting
IEEE Transactions on Neural Networks
Customer Validation of Commercial Predictive Models
Proceedings of the 2010 conference on Data Mining for Business Applications
Obtaining optimal class distribution for decision trees: comparative analysis of CTC and C4.5
CAEPIA'09 Proceedings of the Current topics in artificial intelligence, and 13th conference on Spanish association for artificial intelligence
IEEE Transactions on Evolutionary Computation
Comparison of metrics for feature selection in imbalanced text classification
Expert Systems with Applications: An International Journal
A multi-objective optimisation approach for class imbalance learning
Pattern Recognition
Classification as clustering: A pareto cooperative-competitive gp approach
Evolutionary Computation
An exploration of learning when data is noisy and imbalanced
Intelligent Data Analysis
Combining integrated sampling with SVM ensembles for learning from imbalanced datasets
Information Processing and Management: an International Journal
Inference analysis and adaptive training for belief rule based systems
Expert Systems with Applications: An International Journal
Evolving ensembles in multi-objective genetic programming for classification with unbalanced data
Proceedings of the 13th annual conference on Genetic and evolutionary computation
HAIS'11 Proceedings of the 6th international conference on Hybrid artificial intelligent systems - Volume Part I
Support vector machine classification of uncertain and imbalanced data using robust optimization
Proceedings of the 15th WSEAS international conference on Computers
Compact ensemble trees for imbalanced data
MCS'11 Proceedings of the 10th international conference on Multiple classifier systems
Active class selection for arousal classification
ACII'11 Proceedings of the 4th international conference on Affective computing and intelligent interaction - Volume Part II
Automatic Assessment of Document Quality in Web Collaborative Digital Libraries
Journal of Data and Information Quality (JDIQ)
Proceedings of the 20th ACM international conference on Information and knowledge management
Gradient boosting trees for auto insurance loss cost modeling and prediction
Expert Systems with Applications: An International Journal
An experimental comparison of classification algorithms for imbalanced credit scoring data sets
Expert Systems with Applications: An International Journal
The effect of class imbalance, complexity, size, and learning distribution on classifier performance
International Journal of Advanced Intelligence Paradigms
CAEPIA'11 Proceedings of the 14th international conference on Advances in artificial intelligence: spanish association for artificial intelligence
Consolidated tree classifier learning in a car insurance fraud detection domain with class imbalance
ICAPR'05 Proceedings of the Third international conference on Advances in Pattern Recognition - Volume Part I
Decision trees for hierarchical multilabel classification: a case study in functional genomics
PKDD'06 Proceedings of the 10th European conference on Principle and Practice of Knowledge Discovery in Databases
Boosting prediction accuracy on imbalanced datasets with SVM ensembles
PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Predicting rare extreme values
PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Learning and classifying under hard budgets
ECML'05 Proceedings of the 16th European conference on Machine Learning
Counting positives accurately despite inaccurate classification
ECML'05 Proceedings of the 16th European conference on Machine Learning
Active learning for probabilistic neural networks
ICNC'05 Proceedings of the First international conference on Advances in Natural Computation - Volume Part I
Balancing strategies and class overlapping
IDA'05 Proceedings of the 6th international conference on Advances in Intelligent Data Analysis
Expert Systems with Applications: An International Journal
Generating diverse ensembles to counter the problem of class imbalance
PAKDD'10 Proceedings of the 14th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part II
Instance cloning local naive bayes
AI'05 Proceedings of the 18th Canadian Society conference on Advances in Artificial Intelligence
Symbiotic coevolutionary genetic programming: a benchmarking study under large attribute spaces
Genetic Programming and Evolvable Machines
Relay boost fusion for learning rare concepts in multimedia
CIVR'06 Proceedings of the 5th international conference on Image and Video Retrieval
Genetic programming for classification with unbalanced data
EuroGP'10 Proceedings of the 13th European conference on Genetic Programming
Handling concept drift via ensemble and class distribution estimation technique
ADMA'11 Proceedings of the 7th international conference on Advanced Data Mining and Applications - Volume Part II
A new over-sampling approach: Random-SMOTE for learning from imbalanced data sets
KSEM'11 Proceedings of the 5th international conference on Knowledge Science, Engineering and Management
AI'11 Proceedings of the 24th international conference on Advances in Artificial Intelligence
An empirical evaluation of bagging with different algorithms on imbalanced data
ADMA'11 Proceedings of the 7th international conference on Advanced Data Mining and Applications - Volume Part I
Editorial: Occupation inference through detection and classification of biographical activities
Data & Knowledge Engineering
BRACID: a comprehensive approach to learning rules from imbalanced data
Journal of Intelligent Information Systems
Sample cutting method for imbalanced text sentiment classification based on BRC
Knowledge-Based Systems
On multiview-based meta-learning for automatic quality assessment of wiki articles
TPDL'12 Proceedings of the Second international conference on Theory and Practice of Digital Libraries
Design and Analysis of Classifier Learning Experiments in Bioinformatics: Survey and Case Studies
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Pattern Recognition Letters
A new probabilistic active sample selection algorithm for class imbalance problem
International Journal of Knowledge Engineering and Soft Data Paradigms
Empirical study of bagging predictors on medical data
AusDM '11 Proceedings of the Ninth Australasian Data Mining Conference - Volume 121
Double-base asymmetric AdaBoost
Neurocomputing
ACM Transactions on Intelligent Systems and Technology (TIST) - Survey papers, special sections on the semantic adaptive social web, intelligent systems for health informatics, regular papers
Large scale visual classification with many classes
MLDM'13 Proceedings of the 9th international conference on Machine Learning and Data Mining in Pattern Recognition
Computationally efficient link prediction in a variety of social networks
ACM Transactions on Intelligent Systems and Technology (TIST) - Special Section on Intelligent Mobile Knowledge Discovery and Management Systems and Special Issue on Social Web Mining
Information Sciences: an International Journal
Information Sciences: an International Journal
Computer Methods and Programs in Biomedicine
Repeated labeling using multiple noisy labelers
Data Mining and Knowledge Discovery
Influence of class distribution on cost-sensitive learning: A case study of bankruptcy analysis
Intelligent Data Analysis
Hi-index | 0.01 |
For large, real-world inductive learning problems, the number of training examples often must be limited due to the costs associated with procuring, preparing, and storing the training examples and/or the computational costs associated with learning from them. In such circumstances, one question of practical importance is: if only n training examples can be selected, in what proportion should the classes be represented? In this article we help to answer this question by analyzing, for a fixed training-set size, the relationship between the class distribution of the training data and the performance of classification trees induced from these data. We study twenty-six data sets and, for each, determine the best class distribution for learning. The naturally occurring class distribution is shown to generally perform well when classifier performance is evaluated using undifferentiated error rate (0/1 loss). However, when the area under the ROC curve is used to evaluate classifier performance, a balanced distribution is shown to perform well. Since neither of these choices for class distribution always generates the best-performing classifier, we introduce a "budget-sensitive" progressive sampling algorithm for selecting training examples based on the class associated with each example. An empirical analysis of this algorithm shows that the class distribution of the resulting training set yields classifiers with good (nearly-optimal) classification performance.