Risk prediction and risk factors identification from imbalanced data with RPMBGA+

  • Authors:
  • Topon K. Paul;Ken Ueno;Koichiro Iwata;Toshio Hayashi;Nobuyoshi Honda

  • Affiliations:
  • Toshiba Corporation, Kanagawa, Japan;Toshiba Corporation, Kanagawa, Japan;Toshiba Corporation, Tokyo, Japan;Toshiba Corporation, Tokyo, Japan;Toshiba Corporation, Tokyo, Japan

  • Venue:
  • Proceedings of the 10th annual conference companion on Genetic and evolutionary computation
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we propose a new method to predict the risk of an event very accurately from imbalanced data in which the number of instances of the majority class is very larger than that of the minority class and to identify the features that are relevant for the target risk factor. To solve the trade-off between the prediction rates of the majority and the minority classes, three input parameters are used, which supply the costs of misclassification of an instance from the majority and the minority classes or the sensitivity threshold of the minority class. To get relevant features and to utilize the prior information about the relationship of a feature with the target risk factor, a probabilistic model building genetic algorithm called RPMBGA+ is employed. By applying the proposed technique to the health checkup and lifestyle data of Toshiba Corporation, we have found that the proposed method improves the sensitivity of the minority class and selects a very small number of informative features.