ACOSampling: An ant colony optimization-based undersampling method for classifying imbalanced DNA microarray data

Authors:
Hualong Yu;Jun Ni;Jing Zhao
Affiliations:
School of Computer Science and Engineering, Jiangsu University of Science and Technology, Mengxi Road No.2, Zhenjiang 212003, China;Department of Radiology, Carver College of Medicine, The University of Iowa, Iowa City, IA 52242, USA;College of Computer Science and Technology, Harbin Engineering University, Harbin 150001, China
Venue:
Neurocomputing
Year:
2013

Citing 28
Cited 1

MetaCost: a general method for making classifiers cost-sensitive

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Least Squares Support Vector Machine Classifiers

Neural Processing Letters
Feature selection for high-dimensional genomic microarray data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
AdaCost: Misclassification Cost-Sensitive Boosting

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Exploiting the Cost (In)sensitivity of Decision Tree Splitting Criteria

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Cost-Sensitive Learning by Cost-Proportionate Example Weighting

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Editorial: special issue on learning from imbalanced data sets

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
HykGene: a hybrid approach for selecting marker genes for phenotype classification using microarray gene expression data

Bioinformatics
Training Cost-Sensitive Neural Networks with Methods Addressing the Class Imbalance Problem

IEEE Transactions on Knowledge and Data Engineering
Asymmetric Bagging and Random Subspace for Support Vector Machines-Based Relevance Feedback in Image Retrieval

IEEE Transactions on Pattern Analysis and Machine Intelligence
The Influence of Class Imbalance on Cost-Sensitive Learning: An Empirical Study

ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Cost-sensitive boosting for classification of imbalanced data

Pattern Recognition
Embedded Gene Selection for Imbalanced Microarray Data Analysis

IMSCCS '08 Proceedings of the 2008 International Multi-symposiums on Computer and Computational Sciences
Cluster-based under-sampling approaches for imbalanced data distributions

Expert Systems with Applications: An International Journal
An interactive simulation and analysis software for solving TSP using Ant Colony Optimization algorithms

Advances in Engineering Software
Simultaneous genes and training samples selection by modified particle swarm optimization for gene expression data classification

Computers in Biology and Medicine
Learning from Imbalanced Data

IEEE Transactions on Knowledge and Data Engineering
SMOTE: synthetic minority over-sampling technique

Journal of Artificial Intelligence Research
Gene Selection for Microarray Expression Data with Imbalanced Sample Distributions

IJCBS '09 Proceedings of the 2009 International Joint Conference on Bioinformatics, Systems Biology and Intelligent Computing
The foundations of cost-sensitive learning

IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2
Exploratory undersampling for class-imbalance learning

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Roughly balanced bagging for imbalanced data

Statistical Analysis and Data Mining - Best of SDM'09
An ACO-based algorithm for parameter optimization of support vector machines

Expert Systems with Applications: An International Journal
Combating the Small Sample Class Imbalance Problem Using Feature Selection

IEEE Transactions on Knowledge and Data Engineering
Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning

ICIC'05 Proceedings of the 2005 international conference on Advances in Intelligent Computing - Volume Part I
No free lunch theorems for optimization

IEEE Transactions on Evolutionary Computation
Filter versus wrapper gene selection approaches in DNA microarray domains

Artificial Intelligence in Medicine
Comparing Boosting and Bagging Techniques With Noisy and Imbalanced Data

IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans

Ensemble of online neural networks for non-stationary and imbalanced data streams

Neurocomputing

Quantified Score

Hi-index	0.01

Visualization

Abstract

In DNA microarray data, class imbalance problem occurs frequently, causing poor prediction performance for minority classes. Moreover, its other features, such as high-dimension, small sample, high noise etc., intensify this damage. In this study, we propose ACOSampling that is a novel undersampling method based on the idea of ant colony optimization (ACO) to address this problem. The algorithm starts with feature selection technology to eliminate noisy genes in data. Then we randomly and repeatedly divided the original training set into two groups: training set and validation set. In each division, one modified ACO algorithm as a variant of our previous work is conducted to filter less informative majority samples and search the corresponding optimal training sample subset. At last, the statistical results from all local optimal training sample subsets are given in the form of frequence list, where each frequence indicates the importance of the corresponding majority sample. We only extracted those high frequency ones and combined them with all minority samples to construct the final balanced training set. We evaluated the method on four benchmark skewed DNA microarray datasets by support vector machine (SVM) classifier, showing that the proposed method outperforms many other sampling approaches, which indicates its superiority.