C4.5: programs for machine learning
C4.5: programs for machine learning
Decision Tree Induction Based on Efficient Tree Restructuring
Machine Learning
Noisy replication in skewed binary classification
Computational Statistics & Data Analysis
Ensemble Methods in Machine Learning
MCS '00 Proceedings of the First International Workshop on Multiple Classifier Systems
Improving Identification of Difficult Small Classes by Balancing Class Distribution
AIME '01 Proceedings of the 8th Conference on AI in Medicine in Europe: Artificial Intelligence Medicine
A study of the behavior of several methods for balancing machine learning training data
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Extreme re-balancing for SVMs: a case study
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Gene symbol disambiguation using knowledge-based profiles
Bioinformatics
Cancer classification using Rotation Forest
Computers in Biology and Medicine
Expert Systems with Applications: An International Journal
Cluster-based under-sampling approaches for imbalanced data distributions
Expert Systems with Applications: An International Journal
A Multi-partition Multi-chunk Ensemble Technique to Classify Concept-Drifting Data Streams
PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
A Multiple Expert Approach to the Class Imbalance Problem Using Inverse Random under Sampling
MCS '09 Proceedings of the 8th International Workshop on Multiple Classifier Systems
SMOTE: synthetic minority over-sampling technique
Journal of Artificial Intelligence Research
Microarray data classification based on ensemble independent component selection
Computers in Biology and Medicine
Exploratory undersampling for class-imbalance learning
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
A novel ensemble machine learning for robust microarray data classification
Computers in Biology and Medicine
Hi-index | 0.00 |
Leukemia's types and their relationships to literatures are introduced, based on which data set about Leukemia for classification is constructed with original data sources, such as Cancer Gene Census, PubMed and gene2pubmed. The data set is imbalanced as the research object. Based on the introduction of current classification methods of imbalanced data set, the problems of sampling in imbalanced data set are analyzed, and mixed-sampling method is proposed to classify the Leukemia data set. The multi-class problem about Leukemia is transferred to a set of two-class problems. Area Under Receiver Operating Characteristic (ROC) Curve (AUC) are used to evaluate the mixed-sampling method. Then, experiments are performed to verify the classification efficiency and stability of eight classification methods, and their classification results are comparatively analyzed. It can be found that the mixed-sampling method achieves the best performance. At last, the research work in this paper is concluded with a look forward to the future work.