Instance-Based Learning Algorithms
Machine Learning
Adaptation in natural and artificial systems
Adaptation in natural and artificial systems
C4.5: programs for machine learning
C4.5: programs for machine learning
The nature of statistical learning theory
The nature of statistical learning theory
MetaCost: a general method for making classifiers cost-sensitive
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Genetic Algorithms in Search, Optimization and Machine Learning
Genetic Algorithms in Search, Optimization and Machine Learning
Pattern Recognition and Neural Networks
Pattern Recognition and Neural Networks
Population-Based Incremental Learning: A Method for Integrating Genetic Search Based Function Optimization and Competitive Learning
Learning Rules from Highly Unbalanced Data Sets
ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
Extraction of informative genes from microarray data
GECCO '05 Proceedings of the 7th annual conference on Genetic and evolutionary computation
Boosting for Learning Multiple Classes with Imbalanced Class Distribution
ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
SEAL '08 Proceedings of the 7th International Conference on Simulated Evolution and Learning
Hi-index | 0.00 |
In this paper, we propose a new method to predict the risk of an event very accurately from imbalanced data in which the number of instances of the majority class is very larger than that of the minority class and to identify the features that are relevant for the target risk factor. To solve the trade-off between the prediction rates of the majority and the minority classes, three input parameters are used, which supply the costs of misclassification of an instance from the majority and the minority classes or the sensitivity threshold of the minority class. To get relevant features and to utilize the prior information about the relationship of a feature with the target risk factor, a probabilistic model building genetic algorithm called RPMBGA+ is employed. By applying the proposed technique to the health checkup and lifestyle data of Toshiba Corporation, we have found that the proposed method improves the sensitivity of the minority class and selects a very small number of informative features.