An efficient text categorization algorithm based on category memberships

  • Authors:
  • Zhi-Hong Deng;Shi-Wei Tang;Ming Zhang

  • Affiliations:
  • National Laboratory on Machine Perception, School of Electronics Engineering and Computer Science, Peking University, Beijing, China;National Laboratory on Machine Perception, School of Electronics Engineering and Computer Science, Peking University, Beijing, China;National Laboratory on Machine Perception, School of Electronics Engineering and Computer Science, Peking University, Beijing, China

  • Venue:
  • FSKD'05 Proceedings of the Second international conference on Fuzzy Systems and Knowledge Discovery - Volume Part I
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Text Categorization is the process of automatically assigning predefined categories to free text documents. Although there have existed a large number of text classification algorithms, most of them are either inefficient or too complex. In this paper, we propose the concept of category memberships, which stand for the degrees that words belonging to categories. Based on category memberships, a simple but efficient algorithm is presented. To evaluate our new algorithm, we have conducted experiments using Newsgroup_18828 text collection to compare it with Naive Bayes and k-NN. Experimental results show that our algorithm outperforms Naive Bayes and k-NN if a suitable category membership function is adopted.