The effect of imbalanced data sets on LDA: A theoretical and empirical analysis

  • Authors:
  • Jigang Xie;Zhengding Qiu

  • Affiliations:
  • Institute of Information Science, Beijing Jiaotong University, Beijing 100044, PR China;Institute of Information Science, Beijing Jiaotong University, Beijing 100044, PR China

  • Venue:
  • Pattern Recognition
  • Year:
  • 2007

Quantified Score

Hi-index 0.01

Visualization

Abstract

This paper demonstrates that the imbalanced data sets have a negative effect on the performance of LDA theoretically. This theoretical analysis is confirmed by the experimental results: using several sampling methods to rebalance the imbalanced data sets, it is found that the performances of LDA on balanced data sets are superior to those of LDA on imbalanced data sets.