The decomposed k-nearest neighbor algorithm for imbalanced text classification

  • Authors:
  • Hyung-Seok Kang;Kihyo Nam;Seong-in Kim

  • Affiliations:
  • Division of Industrial Management Engineering, Korea University, Seoul, Republic of Korea;UMLogics Co., Ltd., Seongnam-city, Kyungki-do, Republic of Korea;Division of Industrial Management Engineering, Korea University, Seoul, Republic of Korea

  • Venue:
  • FGIT'12 Proceedings of the 4th international conference on Future Generation Information Technology
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

As textual data have exponentially increased, it is focused that a need for automatic classification of relevant data to one of pre-defined classes. In many practical applications, they assume that training data are evenly distributed among all classes, but they are suffered from an imbalanced problem. Several algorithms and re-sampling methods have been proposed to overcome an imbalanced problem, but they are still facing the overfitting and information missing. This paper proposes the Decomposed K-Nearest Neighbor (DCM-KNN). In training step, the DCM-KNN decomposes training data into misclassified and correctly-classified data set based on the result of traditional KNN, and finds the appropriate KNN for each set. In test step, the DCM-KNN estimates whether test data is similar to misclassified and correctly-classified data set, and applies the appropriate KNNs. Experimental results show that proposed algorithm can achieve more accurate results in an imbalanced condition.