MetaCost: a general method for making classifiers cost-sensitive
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Least Squares Support Vector Machine Classifiers
Neural Processing Letters
Feature selection for high-dimensional genomic microarray data
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
AdaCost: Misclassification Cost-Sensitive Boosting
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Exploiting the Cost (In)sensitivity of Decision Tree Splitting Criteria
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Cost-Sensitive Learning by Cost-Proportionate Example Weighting
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Editorial: special issue on learning from imbalanced data sets
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Training Cost-Sensitive Neural Networks with Methods Addressing the Class Imbalance Problem
IEEE Transactions on Knowledge and Data Engineering
IEEE Transactions on Pattern Analysis and Machine Intelligence
The Influence of Class Imbalance on Cost-Sensitive Learning: An Empirical Study
ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Cost-sensitive boosting for classification of imbalanced data
Pattern Recognition
Embedded Gene Selection for Imbalanced Microarray Data Analysis
IMSCCS '08 Proceedings of the 2008 International Multi-symposiums on Computer and Computational Sciences
Cluster-based under-sampling approaches for imbalanced data distributions
Expert Systems with Applications: An International Journal
Advances in Engineering Software
Computers in Biology and Medicine
IEEE Transactions on Knowledge and Data Engineering
SMOTE: synthetic minority over-sampling technique
Journal of Artificial Intelligence Research
Gene Selection for Microarray Expression Data with Imbalanced Sample Distributions
IJCBS '09 Proceedings of the 2009 International Joint Conference on Bioinformatics, Systems Biology and Intelligent Computing
The foundations of cost-sensitive learning
IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2
Exploratory undersampling for class-imbalance learning
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Roughly balanced bagging for imbalanced data
Statistical Analysis and Data Mining - Best of SDM'09
An ACO-based algorithm for parameter optimization of support vector machines
Expert Systems with Applications: An International Journal
Combating the Small Sample Class Imbalance Problem Using Feature Selection
IEEE Transactions on Knowledge and Data Engineering
Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning
ICIC'05 Proceedings of the 2005 international conference on Advances in Intelligent Computing - Volume Part I
No free lunch theorems for optimization
IEEE Transactions on Evolutionary Computation
Filter versus wrapper gene selection approaches in DNA microarray domains
Artificial Intelligence in Medicine
Comparing Boosting and Bagging Techniques With Noisy and Imbalanced Data
IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans
Hi-index | 0.01 |
In DNA microarray data, class imbalance problem occurs frequently, causing poor prediction performance for minority classes. Moreover, its other features, such as high-dimension, small sample, high noise etc., intensify this damage. In this study, we propose ACOSampling that is a novel undersampling method based on the idea of ant colony optimization (ACO) to address this problem. The algorithm starts with feature selection technology to eliminate noisy genes in data. Then we randomly and repeatedly divided the original training set into two groups: training set and validation set. In each division, one modified ACO algorithm as a variant of our previous work is conducted to filter less informative majority samples and search the corresponding optimal training sample subset. At last, the statistical results from all local optimal training sample subsets are given in the form of frequence list, where each frequence indicates the importance of the corresponding majority sample. We only extracted those high frequency ones and combined them with all minority samples to construct the final balanced training set. We evaluated the method on four benchmark skewed DNA microarray datasets by support vector machine (SVM) classifier, showing that the proposed method outperforms many other sampling approaches, which indicates its superiority.