Adapting centroid classifier for document categorization

  • Authors:
  • Songbo Tan;Yuefen Wang;Gaowei Wu

  • Affiliations:
  • Key Laboratory of Network, Institute of Computing Technology, Chinese Academy of Sciences, China;Information Center, Chinese Academy of Geological Sciences, China;Key Laboratory of Network, Institute of Computing Technology, Chinese Academy of Sciences, China

  • Venue:
  • Expert Systems with Applications: An International Journal
  • Year:
  • 2011

Quantified Score

Hi-index 12.05

Visualization

Abstract

In the community of information retrieval, Centroid Classifier has been showed to be a simple and yet effective method for text categorization. However, it is often plagued with model misfit (or inductive bias) incurred by its assumption. Various methods have been proposed to address this issue, such as Weight Adjustment, Voting, Refinement and DragPushing. However, existing methods employ only one criterion, i.e., training-set error. Researches in machine learning indicate that training-set error based method cannot guarantee the generalization capability of base classifiers for unseen examples. To overcome this problem, we propose a novel Model Adjustment algorithm, which makes use of training-set errors as well as training-set margins. Furthermore, we prove that for a linearly separable problem, proposed method converges to the optimal solution after finite updates using any learning parameter @h(@h0). The empirical assessment conducted on four benchmark collections indicates that proposed method performs slightly better than SVM classifier in prediction accuracy, as well as beats it in running time.