EasyEnsemble and Feature Selection for Imbalance Data Sets

  • Authors:
  • Tian-Yu Liu

  • Affiliations:
  • -

  • Venue:
  • IJCBS '09 Proceedings of the 2009 International Joint Conference on Bioinformatics, Systems Biology and Intelligent Computing
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

There are many labeled data sets which have an unbalancedrepresentation among the classes in them. When the imbalance islarge, classification accuracy on the smaller class tends to belower. In particular, when a class is of great interest but occursrelatively rarely such as cases of fraud, instances of disease, andso on, it is important to accurately identify it. Here we propose a novel algorithm named MIEE(Mutual Information based feature selection for EasyEnsemble) totreat this problem and improve generalization performance of theEasyEnsemble classifier. Experimental results on the UCI data setsshow that MIEE obtain better performance, compared with theasymmetric bagging and EasyEnsemble.