Neighbor-weighted K-nearest neighbor for unbalanced text corpus

  • Authors:
  • Songbo Tan

  • Affiliations:
  • Software Department, Institute of Computing Technology, Chinese Academy of Sciences, P.O. Box 2704, Beijing 100080, People's Republic of China and Graduate School of the Chinese Academy of Science ...

  • Venue:
  • Expert Systems with Applications: An International Journal
  • Year:
  • 2005

Quantified Score

Hi-index 12.07

Visualization

Abstract

Text categorization or classification is the automated assigning of text documents to pre-defined classes based on their contents. Many of classification algorithms usually assume that the training examples are evenly distributed among different classes. However, unbalanced data sets often appear in many practical applications. In order to deal with uneven text sets, we propose the neighbor-weighted K-nearest neighbor algorithm, i.e. NWKNN. The experimental results indicate that our algorithm NWKNN achieves significant classification performance improvement on imbalanced corpora.