Reducing overfitting of AdaBoost by clustering-based pruning of hard examples

  • Authors:
  • Dae-Sun Kim;Yeul-Min Baek;Whoi-Yul Kim

  • Affiliations:
  • Hanyang University, Seoul, Korea;Hanyang University, Seoul, Korea;Hanyang University, Seoul, Korea

  • Venue:
  • Proceedings of the 7th International Conference on Ubiquitous Information Management and Communication
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

In order to solve the problem of overfitting in AdaBoost, we propose a novel AdaBoost algorithm using K-means clustering. AdaBoost is known as an effective method for improving the performance of base classifiers both theoretically and empirically. However, previous studies have shown that AdaBoost is prone to overfitting in overlapped classes. In order to overcome the overfitting problem of AdaBoost, the proposed method uses K-means clustering to remove hard-to-learn samples that exist on overlapped region. Since the proposed method does not consider hard-to-learn samples, it suffers less from the overfitting problem compared to conventional AdaBoost. Both synthetic and real world data were tested to confirm the validity of the proposed method.